LOCAL ASSESSMENT OF PERTURBATIONS *l*.'*»t* By GLEN LAWSON HARTLESS - . A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2000 ..'V ^ *".• . ■ . Copyright 2000 by Glen Lawson Hartless To Christy A ACKNOWLEDGMENTS I would first like to thank my chairman, Jim Booth, and cochairman, Ramon Littell, for all of their help and guidance. Their combined knowledge, insight, and mentoring has been extraordinary. I would also like to express my gratitude to them for allowing me to pursue avenues of research that are probably closer to my heart than theirs. Thanks also go to the other esteemed professors who served on my supervisory committee: James Algina, Randy Carter, and James Hobert. I would also like to acknowledge people who have influenced me academically and professionally over the years: Chuck Nowalk, John Grant, and many others in the Phillipsburg and Lopatcong school systems, Walter Pirie and the faculty at Virginia Tech, Ron DiCola at AT&T/Lucent, Larry Leemis at the College of William & Mary, Danny Perkins, Bo Beaulieu, and many others at the University of Florida, and also fellow students Jonathan Hartzel, Phil McGoff, and Galin Jones, among many others. I would also like to thank Allan West for maintaining an extremely reliable computing environment. Thanks also go to my parents for their love and support and for instilling in me the discipline needed to achieve my goals. Finally, my utmost thanks go to my wife, Christine. I am forever grateful for her unequivocal love, unending patience, and thoughtful insight. IV TABLE OF CONTENTS page ACKNOWLEDGMENTS iv ABSTRACT vii CHAPTERS , , , ^ %■ 1 INTRODUCTION TO LOCAL INFLUENCE 1 1.1 Premise 1 1.2 Measuring the Effects of Perturbations 1 1.3 Local Influence Diagnostics 3 1.4 Perturbations in Regression 8 1.5 Basic Perturbations 18 1.6 Brief Literature Review 18 1.7 Outline 20 2 COMPUTATIONAL ISSUES IN LOCAL INFLUENCE ANALYSIS 21 2.1 Curvatures and the Direction of Maximum Curvature .... 21 2.2 Local Influence Analysis of a Profile Likelihood Displacement 23 2.3 Building Blocks of Local Influence for Regression 24 2.4 Perturbations to the Response in Regression 27 2.5 Unequal Measurement Precision Applied to Regression ... 31 2.6 Benchmarks for Perturbation Analyses 36 2.7 Local Assessment Under Reparameterizations 40 3 PERTURBATIONS TO THE X MATRIX IN REGRESSION ... 49 3.1 Perturbations to Explanatory Variables 49 3.2 Perturbations to a Single Column 50 3.3 Perturbations to a Single Row 59 3.4 Perturbations to the Entire Design Matrix 65 4 LOCAL ASSESSMENT FOR OPTIMAL ESTIMATING FUNCTIONS 87 4.1 Inferential Platform 88 4.2 Local Assessment of Perturbations 93 4.3 Acceleration, Curvature and Normal Curvature 96 4.4 Algebraic Expressions for Local Assessment 98 4.5 Computational Formulas for J and P 103 4.6 Local Assessment for Maximum Likelihood Inference 110 4.7 First-order Assessment of the Log Generalized Variance ... 116 4.8 Local Assessment for Quasi-likelihood Inference 125 5 LOCAL ASSESSMENT OF PREDICTIONS 134 5.1 Prediction via First Principles 134 5.2 Predictive Likelihood 136 5.3 Influence Measures for Prediction 138 5.4 Estimation and Prediction in Linear Mixed Models 140 5.5 Local Influence Methods for Linear Mixed Models 146 5.6 One-way Random Eflfects Model 151 5.7 Residual Variance Perturbation Scheme 154 5.8 Cluster Variance Perturbation Schemes 159 5.9 Application to Corn Crop Data 167 6 MULTI-STAGE ESTIMATIONS AND FUTURE RESEARCH ... 185 6.1 Local Assessment in Multi-stage Estimations 185 6.2 Synopsis and Additional Future Research 186 APPENDICES A MODEL FITTING INFORMATION FOR SCHOOL DATA .... 188 B MODEL FITTING INFORMATION FOR CORN CROP DATA . . 191 REFERENCES 194 BIOGRAPHICAL SKETCH 202 VI fty? Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy LOCAL ASSESSMENT OF PERTURBATIONS By Glen Lawson Hartless August 2000 Chairman: James G. Booth Major Department: Statistics Statistical models are useful and convenient tools for describing data. However, models can be adversely effected by mistaken information, poor assumptions, and small subsets of influential data. Seemingly inconsequential changes to the observed data, the postulated model, or the inference technique may greatly affect the scientific conclusions made from model-fitting. This dissertation considers local assessment of perturbations in parametric and semi-parametric models. The effects of perturbation are assessed using a Taylor-series approximation in the vicinity of the fitted model, leading to graphical diagnostics that allow the analyst to identify sensitive parameter estimates and sources of undue influence. First, computational formulas are given to assess the influence of perturbations on linear combinations of the parameter estimates for maximum likelihood. A diagnostic plot that identifies sensitive fitted values and self-predicting observations in linear regression is introduced. Second, the eigensystem of the acceleration matrix for perturbing each element of the design matrix in linear regression is derived. vii This eigensystem leads to a diagnostic plot for identifying specific elements that are influential, and it is shown how the results relate to principal components, collinearity, and influence diagnostics. Next, it is shown that the conceptual framework and computational tools easily extend from maximum likelihood to inference that is performed using estimating equations. In particular, local assessment of parameter estimates and standard errors derived from optimal estimating functions is addressed. Algebraic expressions and computational formulas are given for both first- and second-order assessment, and applications such as quasi-likelihood are considered. Finally, the techniques are applied to linear mixed models in order to assess the influence of individual observations and clusters. While previous applications considered only estimation, the effects of perturbations on both estimation and prediction is addressed. It is shown that the effects on the fixed and random effect inferences can be assessed simultaneously by using an influence measure based on the unstandardized predictive log-likelihood of the random effects. k. CHAPTER 1 INTRODUCTION TO LOCAL INFLUENCE ; .' 1.1 Premise George Box (1982, p. 34) stated "all models are wrong; some models are useful," and A. F. Desmond (1997b, p. 117) stated "a robust method ought not to perform poorly in neighborhoods of the nominally true model." Indeed, conclusions based on a statistical model should be reliable under small perturbations to the data, the model's assumptions or the particulars of the estimation method. In a seminal paper. Cook (1986) utilized the basic concept of perturbing the postulated model or the observed data to develop a variety of diagnostic tools. He showed how small perturbations can be used to develop easily-computable diagnostics for almost any model estimated by maximum likelihood (ML) estimation. 1.2 Measuring the Effects of Perturbations In a statistical analysis, parameter estimates are a crucial part of the decision-making process. These estimates are a function of the observed data, the postulated model, and the analyst's choice of statistical inference technique. Any one of these three aspects of a statistical-modeling situation is subject to uncertainty and doubt: observed data may be incorrectly recorded or subject to rounding error, a slightly different model may be just as plausible as the postulated model, and nuisance aspects of modeling such as choosing hyperparameters are subject to second-guessing. If small changes in these aspects of a statistical modeling problem create large changes in parameter estimates, the analysis is unreliable. Therefore, the change in parameter estimates due to perturbation can serve as a barometer for the robustness of the scientific conclusions. Following Cook (1986), a vector of q perturbations is denoted as u = {u}i,U2, . ■ ■ ,ujq)', and the g-dimensional space of possible perturbations is denoted as ft. Included in this space is one point, wo, that does not change the original formulation of the problem, and this is called the null perturbation. The perturbations are not parameters to be estimated, but rather they are known real-valued quantities introduced into the problem by the analyst. These perturbations might represent measurement errors, unequal variances, case weights, correlated errors, missing predictors or some other change to the data, model or inference method. Prudent choice of perturbations can lead to useful diagnostics. For example, if parameter estimates are sensitive to perturbing the contribution (weight) of a particular observation to estimation, it indicates that the observation is overly influential on the estimates. For data y and parameters 6, let L{0] y) denote the log-likelihood from the original postulated model, and let L^{9\y) denote the log-likelihood resulting from the inclusion of perturbations. If the perturbation specified is the null perturbation, then the two likelihoods are equivalent. The resulting maximum likelihood estimators (MLEs) of these two likelihoods are denoted as and 0^^, respectively. Cook (1986) used the likelihood displacement to summarize parameter estimate changes under perturbation: LD(u;) = 2[L(0;y)-L(0,;j/)]. (1.1) From inspection of (1.1), we see that the likelihood displacement is the difference of the original likelihood evaluated at two values: the original MLEs and the perturbed MLEs. Because maximizes the original likelihood, LD{u}) is non-negative and achieves its minimum value of zero when u; = Wq- As the perturbed MLEs become increasingly different from the original MLEs, LD{u)) becomes larger. Thus, the size of LD{u}) provides a measure of how much the perturbation affects parameter estimation. Effects on a subset of parameters, Oi from 0' = {e[, O'^), can be obtained by using the profile likelihood displacement: LD^''^{Lj) = 2[L{e,,02;y)-L{0i^,g{ei^y,y)], (1-2) where Oi^ is from 9^ = {0[^, 9^^, and g{9i^) is the vector that maximizes the profile log-likelihood, L{9i^,92), with respect to 02- The profile likelihood displacement allows influence to be "directed" through 9i, with the other parameter estimates only changing in sympathy. In the next section it is shown how to obtain diagnostic tools from a local influence analysis of the likelihood displacement and the profile likelihood displacement. "*-:..'%>• _v. -^ «- '■ ■• t- '' • : . ■■*■ '^.^ Vr, V>' 1.3 Local Influence Diagnostics .- Although we could simply choose values for a? and compute LD{lj), this is not particularly useful for two reasons. First, LD{u:) is often not a closed-form function of w. Hence, if we wish to find its value, re-fitting the model may be necessary. Second, even if calculating LD{u}) was easy, we must somehow decide what values to use for w. This choice would be highly data-dependent and a nuisance for the analyst. Therefore, we would like a methodology that can examine the information contained in the function LD{u}) and yet does not require a choice of actual values for a> nor involve re-estimating the model parameters. Following Cook (1986), this can be accomplished using the notion of curvature. Simple plots can then be used to identify whether estimation is sensitive to any particular elements of the g-dimensional perturbation. ■B^?;. ■ •' i,r-^..i?!■ *- /• Figure 1.1: Schematic of a plane curve with its tangent line. The angle between the tangent line and the x-axis is (f). 1.3.1 Curvature of a Plane Curve Curvature is a measure of how quickly a curve turns or twists at a particular point (Swokowski 1988). Let {f{t),g{t)) be a smooth plane curve. The arc length parameter s of such a curve is the distance travelled along the curve from a fixed point A to another point on the curve (Figure 1.1). Letting Ia denote the value of t that yields point A, s can be computed as {t)= [\r{u)f + {g'{u)f)Uu, JtA (1.3) and we may express the curve in terms of s: {f{s~^{s)),g{s~^{s))). Additionally, let T{s) be the tangent line at a point on the curve, and let (j) denote the angle between T{s) and the x-axis. Curvature is defined as the absolute value of the rate of change in with respect to s: 50 C = ds (1.4) A plane curve that is turning quickly at {f{t),g{t)) has a larger curvature than a curve that is nearly straight at that point. ! Ct Figure 1.2: Curve with two circles of curvature The reciprocal of curvature is called the radius of curvature, and it is the radius of the best fitting circle at that point on the curve. These circles of curvature have the same first and second derivatives and tangent line as the curve and lie on its concave side. Figure 1.2 shows the curve (|sin(f), \sm{2t)) and two of its circles of curvature. The one on the left occurs when t = ^ while the other one occurs when t = arccos(O). Their curvatures are 6 and 8/9 respectively. The expression "turning on a dime" refers to to a sharp turn: one in which the radius of curvature is small, and consequently the curvature of the path is large. The formula for curvature is \f'{t)g"{t)-g'{t)f"{t) C We are particularly interested in assessing curvature at the local minimum of a function from x to y, i.e. when x = t. For this special case, curvature measures how quickly the function increases from that minimum value. In this case, (1.5) simplifies to the acceleration of the curve: d'g{x) (1.5) C = dx^ (1.6) X^Xq Figure 1.3: Likelihood displacement with directional vectors where g{xo) is the function's minimum. This result can be applied to LD{u}) at u;o when there is only a single perturbation, lo. Here, the curvature is given by :■- . \. d^LD{uj) doj^ (1.7) Ul=(jJo and it measures how quickly the parameter estimates change when u is moved away from the null perturbation. A large curvature indicates that even if uj is close to, or local to ljq, the likelihood displacement increases substantially. Hence, a large curvature indicates that even a slight perturbation can be very influential on parameter estimates. 1.3.2 Using Plane Curves to Describe LDju:) The curvature of plane curves can also be used to describe LD{u}) when q > 1. Now u> can be moved away from wq in different directions, as shown by the vectors in Figure 1.3 for a two-dimensional perturbation. These vectors can be written as U3v{a) = Wo + av, (1.8) Figure 1.4: Intersection of plane and surface creates space curve where the vector v determines the direction and the value a determines the position relative to wq- Let Py denote a plane that contains u}y{a) and extends directly upward from ft to intersect LD{(jj) (Figure 1.4). The intersection of Pt; and LD{u}) forms a space curve, (a;„(a)',LZ)(u;„(a)))^ . Because this space curve is contained within the plane Py, its curvature can be obtained as if it were a plane curve, (a, LD{u}y{a)). Thus the curvature of LD{u:y{a)) at Wo is simply d^LD{u}y{a)) L/y — dd^ (1.9) a=0 Different directions produce curvatures of various sizes. The normalized vector that produces the largest curvature is denoted Umax and is of particular interest. 1.3.3 Direction of Maximum Curvature, v 1 '^max The relative sizes of the elements of Vmax indicate which elements of ui are most influential on the parameter estimates. For example, if the i*'* element of Uma: ^ More formally, these space curves are called normal sections. The term comes from the fact that each plane, Py, contains the surface's principal normal vector, the vector that is orthogonal to the surface's tangent plane at ijJq. Curvatures of the normal sections are referred to as normal curvatures. 8 is large, then Wj is a key player in producing the largest curvature of LD(a;) at the null perturbation, i.e. in producing an immediate change in parameter estimates. The direction of maximum curvature is the main diagnostic of Cook's local influence methodology. Typically, Vmax is standardized to have length one and is then plotted against indices l,2,...,q. If any elements are particularly large relative to the others, they stand out in the plot. The analyst then may want to take steps to alleviate the sensitivity to the perturbation. What, if any, steps to be taken depend upon the perturbation scheme and the particulars of the data analysis. For some models and perturbation schemes, the formula for Vmax is known explicitly, while in other situations it requires some moderate computations. In analyses such as regression, Vmax is typically a function of familiar quantities. 1.4 Perturbations in Regression Consider the normal linear regression model: Y = Xl3 + e, (1.10) where V is a n x 1 vector of responses, X is a known n x k model matrix of rank k, P is a k X 1 vector of unknown parameters, and e is a n x 1 vector of independent identically distributed normal random variables with mean and variance cr^ > 0. Hence, Y ~ N{X^, a2j„), and = (/?i, ...,l3k, a^)'. 1.4.1 Scheme 1: Measurement Error in the Response In order to examine the effects of measurement error in the response, Schwarzmann (1991) introduced perturbations to the observed data. He considered fitting a model to the observed data set with slight perturbations: tji + a;,-, for i — 1,. . . ,q. Hence, ojo is a vector of zeroes for this perturbation scheme. By letting q — n, all of the observations can be perturbed simultaneously. This allows the V^^^^- T ,F%v" t I "7 >«.»j ■ J* Table 1.1: Scottish hills race data Observation Record Time (min.) Distance (miles) Climb (ft) 1 16.083 2.5 650 2 48.350 6.0 2500 3 33.650 6.0 900 4 45.600 7.5 800 5 62.267 8.0 3070 6 73.217 8.0 2866 7 204.617 16.0 7500 8 36.367 6.0 800 9 29.750 5.0 800 10 39.750 6.0 650 11 192.667 28.0 2100 12 43.050 5.0 2000 13 65.000 9.5 2200 14 44.133 6.0 500 15 26.933 4.5 1500 16 72.250 10.0 3000 17 98.417 14.0 2200 18 78.650 3.0 350 19 17.417 4.5 1000 20 32.567 5.5 600 21 15.950 3.0 300 22 27.900 3.5 1500 23 47.633 6.0 2200 24 17.933 2.0 900 25 18.683 3.0 600 26 26.217 4.0 2000 27 34.433 6.0 800 28 28.567 5.0 950 29 50.500 6.5 1750 30 20.950 5.0 500 31 85.583 10.0 4400 32 32.383 6.0 600 33 170.250 18.0 5200 34 28.100 4.5 850 35 159.833 20.0 5000 Source: Atkinson (1986) 10 analyst to determine which observations would have the most effect on parameter estimates in the event that there is measurement or rounding error. 1.4.1.1 Influence on the full vector of parameter estimates Atkinson (1986) analyzed a data set of record times for 35 diff'erent hill races, as recorded by the Scottish Hill Runners Association (Table 1.1). Scatter plots indicate the positive association of the three variables: winning time, race distance and race climb. (Figure 1.5). The influence of perturbations on the parameter estimates from a regression of winning time on DISTANCE and CLIMB is of interest. For convenience, the regressors and response have been centered and scaled, thereby eliminating the need for an intercept. Thus, the full vector of p parameters is 9 = (/^distj^cUmbjO" )'• Figure 1.6 shows Vmax plotted against the indices 1 through g = n = 35 for the Scottish hills data. Element 18, which corresponds to perturbation of the 18*'* observation, is the dominant contributer to the direction of maximum curvature. This observation is from a short race with little CLIMB, and yet it has a large record time. Similar races have much smaller record times, indicating that race 18 is an outlier. Atkinson (1986) has actually suggested that the winning time for this race may have been 18.67 minutes rather than 1 hour and 18.67 minutes. Interestingly, it can be shown that the direction of maximum curvature for Scheme 1 is proportional to the residuals. Thus, if we standardized the vector of residuals to length 1 and plotted them against case indices, we would obtain Figure 1.6. This implies that parameter estimates are most sensitive to a set of small additive measurement errors when those errors tend to exaggerate (or correct) the model's error in estimating each data point, with the most exaggeration (or correction) occurring for the most poorly predicted points. %' > ■''" - ''' :--'^-' y i * 11 Time 200 150 100 •11 •18 :•• / 5 10 15 20 25 30 Distance Time 200 150 100 •11 • 18 J.: • • •••• 1000 2000 3000 4000 5000 6000 7000 8001 niitb (a) Winning time vs. race distance (b) Winning time vs. race climb Climb 80UU 7000 • 7 6000 _.. 5000 4000 • •t 3000 2000 1000 1 • t .11 5 10 15 20 25 30 Distance (c) Race climb vs. race dis- tance Figure 1.5: Hills data; scatter-plots with highlighted cases 0.75 « 0.5 I ^ 0.25 in §-0.25 e w -0.5 -0.75 * • . « . * • • * ♦ ' *»\ * • • • . • 5 10 15 20 25 30 35 Case Figure 1.6: Hills data; response perturbation; Umax for 12 1 0.75 0.5 0.25 -0.25 ■0.5 -0.75 •1 ■ . t t* *„*-i- «. »• 10 . *' *^',»«», H *^'' 1 0.75 0.5 0.25 -0.25 -0.5 -0.75 -1 • • • • t,5 »*if ,15 ,•%* * »• V , •35 (a) Umax for directed influ- ence on $^st (b) Umax for directed influ- ence on /3climb Figure 1.7: Hills data; response perturbation 1.4.1.2 Influence on a subset of parameter estimates Plotting Vmax for the profile likelihood displacement, LD^^^^{(jj), can indicate which components of a; affect a subset of parameters. Examining influence on §1 = (7^ results in the same Umax as the original likelihood displacement, i.e. it is proportional to the residuals. The directions of maximum curvatures for 6i = ^dist and §1 = /3cUmb are shown in Figures 1.7(a) and 1.7(b). The first plot indicates that small measurement error in race 11 's winning time is most influential on ^dist- In the plot of Umax for /^cUmb, elements 7 and 11 stand out. The fact that the two elements have opposite signs can be used for interpretation. They indicate that maximum curvature is primarily achieved by perturbing winning times 7 and 11 in opposite directions (i.e. making one observation smaller and the other larger). Thus, if the record winning times for races 7 and 11 were slightly closer together or farther apart, this would aff'ect the slope estimate for CLIMB. 1.4.1.3 Plotting options Note that the standardized i>max is only unique up to sign change. For example. Figure 1.8 displays the same standardized vector as Figure 1.6, but with all of the elements' signs switched. This suggests an alternative plot in which the 1 o.n 0.5 0.25 -0.25 -o.s •0.75 • • 5 K» ^5 2Tf "ft 30 ,35 Figure 1.8: Hills data; alternate rendering of v^ 13 signs of the elements are suppressed: a star plot. Here, each element is displayed as a point, with the distance from the origin equal to its absolute value. The q elements are simply displayed around the point (0,0), starting in the first quadrant and ending in the fourth quadrant. The plot is completed by labelling the points with indices. Labeling all points can result in cluttered plot (1.9(a)), and so a better option is to only label the largest elements (1.9(b)). In these renderings, element 18 is again noticeably large, but our attention is also drawn to element 7. This observation has the characteristics of an influential observation: it has the highest climb and one of the longest distances (Figure 1.5(c)), as well as the second largest residual. Unlike the index plot, the star plot does not give the illusion that the indices are in a specific order. However, the suppression of element signs can be an advantage or a disadvantage. The advantage is that when looking at several plots of directional vectors, suppressing the signs helps the analyst identify patterns by emphasizing element size. The disadvantage is that the diagnostic interpretation of the signs is lost. i^vV 0.5- 26 , 31 -0.5 0.5 (a) All elements numbered 1 -1 -0.5 0.5 14 (b) Largest elements numbered Figure 1.9: Hills data; |vmax| as a star plot 1.4.2 Scheme 2: Perturbations to Measurement Precision Another assumption that can be perturbed is that of equal variance (Cook 2 1986). This is accomplished by letting V{yi) = ^ for i = 1, . . . , n. Here the null perturbation is Wq = In- With this perturbation scheme, we deem that some observations are observed with more precision than others. The resulting perturbed MLEs weight each case according to its perturbed variance. If certain cases are influential, then even a slight down-weighting or up-weighting of them produces large displacements. This perturbation scheme has an important connection to the widely-used influence measure Cook's Distance (Cook 1977). Recall that the i*^^ Cook's Distance is a measure of the change in regression parameter estimates that results from deleting the i*'' observation. Clearly a deleted observation provides no information for finding /9. Similarly, as the i"* perturbation is decreased from 1, the i*'* observation provides less and less information to the estimation process owing to its increased :. - - • -. ^ 15 variance. Thus, case-deletion can be thought of as a rather drastic perturbation: one that results in a variance -> oo for the i*^ observation. Case-deletion is referred to as a global influence technique because it assesses the effect of perturbations that are not local to the null perturbation in ft. 1.4.2.1 Influence on the full vector of parameter estimates Figure 1.10(a) shows the direction of maximum curvature for the Scottish hills data set. Again, element 18 dominates the direction of maximum curvature. This implies that the precision of observation 18 is the most influential on the full set of parameter estimates. 1.4.2.2 Influence on a^ Using the profile likelihood displacement, influence directed upon a^ alone can be assessed. Observation 18 is again implicated by Umax as being most locally influential (Figure 1.10(b)). 1.4.2.3 Influence on individual regression parameter estimates Figure 1.10(d) shows Umax when isolating effects on /^aist- Although there are several points that stand out, observation 7 is identified as being most influential since element 7 is large. This race is also implicated as being locally influential in the plot of Umax for Z^cUmb (Figure 1.10(c)). In fact, the DEBET AS diagnostic (Belsley et al. 1980) also identifies race 7 as being influential on /5cUmb- Recall that the value of DEBET ASi^j measures how the f^ parameter estimate is affected by deleting the i^^ observation. Writing the n DEBET AS for the /'' parameter estimate in vector form, it is shown in Section 2.5 that this vector is proportional to the following Hadamard (element-wise) product of vectors: 16 It 0.75 0.5 0.25 ■0.25 •0.5 ■0.75 -1 5 10 15 20 25 30 35 1 0.75 0.5 0.25 ■0.25 -0.5 ■0.75 ■1 5 10 15 20 25 30 35 (a) Vmax for 6 I (b) Vmax for ^^ 1 0.75 0.5 0.25 ■0.25 ■0.5 ■0.75 •1 » . ** • * ** *. »» » * * A t«»* A **» * * ' - • * flV "i^ 20 '2^ 30 36 1 0.75 0.5 0.25 ■0.25 ■0.5 ■0.75 •1 ,5 ft ** «0 •• 25 30 ,35 (C) Vmax for /9cUmb (d) Vmax for /9dist 1 0.75 0.5 0.25 ■0.25 ■0.5 ■0.75 -I « »". ' »» * »»»«» % «t. jU »»t * ' » I ft' >5i 27?? 30 ^ 1 0.75 0.5 0.25 ■0.25 ■0.5 ■0.75 ■1 • * • I »»'» »»» t . ' •* . ' * »m»* * »■ , S ft •!* ^0 •• 25 30 3! • • • (e) Umax for (f) V2 for Figure 1.10: Hills data; variance perturbation 17 1 0.75 0.5 0.25 -0.25 -0.5 -0.75 -1 • f IT) •l5* 20 2^ 30 ft Figure 1.11: Standardized vector oi DFBETAS for /3cUmb -«r 1-/111 1 1-/122 \ l-h„n / r2 ^2i V ^nj J (1.11) Here, /i,i is the i^^ diagonal element of the "hat" matrix X{X'X)~^X' , r^ is the i*'^ residual, and rij is the z*'' residual from regressing Xj on the other columns of X. It can be shown (see Section 2.5) that Umax for directing the effects of the variance perturbation on $j is proportional to t-\ /.,^ ^2 ^2j (1.12) \^n ) \ rnj j The elements of the two vectors differ only by a factor of y^- • In fact, standardizing the vector of 35 DFBETAS for ^cUmb to length 1 and plotting them (Figure 1.11) produces a figure nearly identical to Figure 1.10(c). 18 Influence on both regression parameter estimates Figure 1.10(e) displays the direction of maximum curvature for directed influence on ^. Plot (f) is the direction of largest curvature under the constraint of orthogonality to Vmax- The two plots are very similar to the directed influence plots in (c) and (d). 1.5 Basic Perturbations As an alternative to finding the direction of maximum curvature, we may utilize basic perturbations. The i*'' basic perturbation is the same as the null perturbation except for an additive change of 1 in its i*'' element. For example, the first basic perturbation is Wq + (1, 0, . . . , 0)', and corresponds to moving a distance of 1 along the first axis in fi. Typically, the curvatures for each of the q basic perturbations would be computed and then plotted together. As an example, consider applying the variance perturbation to a linear regression, and examining directed influence on ^. In this case, the i*'' basic curvature, Cfr., is a local influence analog to the z*'' Cook's Distance, because it corresponds to re- weighting a single observation. It is shown in Section 2.7.2.1 that the i^^ basic perturbation for this scheme measures influence on the i^^ fitted value. Figure 1.12(a) plots the 35 curvatures for the basic perturbations applied to the Scottish hills data, while Figure 1.12(b) plots the 35 Cook's Distances. Although the scales differ, they provide similar information. Both plots draw attention to race 7, but they differ somewhat in their emphasis of races 11 and 18. * 1.6 Brief Literature Review Local influence techniques have been applied to a number of statistical models, most notably univariate and multivariate linear models (Cook 1986; Cook and Weisberg 1991; Lawrance 1991; Schwarzmann 1991; Tsai and Wu 1992; Wu and Luo 1993a; Kim 1995; Kim 1996c; Tang and Fung 1996; Fung and Tang 1997; 19 • . • 5 10 15 20 25 30 35 (a) Basic curvatures .J' 2.5 ! 1.5 0.5 5 10 15 20 25 30 35 (b) Cook's Distances Figure 1.12: Hills data; C^'s for variance perturbation and Cook's Distances Kim 1998; Rancel and Sierra 1999). A number of articles have also appeared on applications in generalized linear models (GLMs) (Cook 1986; Thomas and Cook 1989; Thomas and Cook 1990; Thomas 1990; Lee and Zhao 1997) and survival analysis (Pettitt and Baud 1989; Weissfeld and Schneider 1990; Weissfeld 1990; Escobar and Meeker 1992; Barlow 1997; Brown 1997). Other applications include linear mixed models (Beckman et al. 1987; Lesaffre and Verbeke 1998; Lesaffre et al. 1999), principal components analysis (Shi 1997), factor analysis (Jung et al. 1997), discriminant analysis (Wang et al. 1996), optimal experimental design (Kim 1991), structural equations (Cadigan 1995; Wang and Lee 1996), growth curve models (Pan et al. 1996; Pan et al. 1997), spUne smoothing (Thomas 1991), Box-Cox transformations (Lawrance 1988; Kim 1996b), nonlinear models (St. Laurent and Cook 1993; Wu and Wan 1994), measurement error models (Zhao and Lee 1995; Lee and Zhao 1996), and elliptical linear regression models (Galea et al. 1997). Many authors have presented modifications of, extentions to, and theoretical justifications for the basic methodology (Vos 1991; Schall and Gonin 1991; Schall and Dunne 1992; Farebrother 1992; Wu and Luo 1993a; Billor and Loynes 1993; Kim 1996a; Fung and Kwan 1997; Lu et al. 1997; Cadigan and Farrell 1999; Poon 20 and Poon 1999). Extentions to Bayesian methodology have also been considered (McCulloch 1989; Lavine 1992; Pan et al. 1996). ..'■■,"' 1.7 Outline In Chapter 2, computational formulas for local influence analysis are discussed and derived for the two perturbation schemes discussed in this chapter. Chapter 3 contains derivations and examples of local influence diagnostics for linear regression under perturbations to the X matrix. In Chapter 4, the methodology is extended to a general class of influence measures and estimation techniques. Computational tools are also provided. In Chapter 5, inference and local influence analysis for prediction problems are discussed. Techniques for assessing estimation and prediction in linear mixed models are developed and applied to data. In Chapter 6, some preliminary results that can be applied to multi-stage estimations are presented. The chapter also contains commentary on additional areas of future research. The appendices contain details for data analyses. CHAPTER 2 COMPUTATIONAL ISSUES IN LOCAL INFLUENCE ANALYSIS In this chapter, computational formulas for local influence are reviewed (Cook 1986; Beckman et al. 1987), To demonstrate the mechanics of the formulas, detailed derivations of previous results for the regression model (Cook and Weisberg 1991; Schwarzmann 1991) will be presented with minor additions. Next, the issue of how to obtain objective benchmarks is considered, and some new novel solutions are outlined. Finally, local influence is discussed under reparamaterizations of the perturbation space ft and the parameter space 0. By applying previous results (Escobar and Meeker 1992; Pan et al. 1997), some new diagnostic tools for influence on fitted values in linear regression are derived. 2.1 Curvatures and the Direction of Maximum Curvature Cook (1986) showed that the curvature of LD{u}) in direction v can be expressed as C^ = 2v'A'{-L~^)Av, (2.1) where —L is the observed information matrix of the original model, -L deae' e^e and Apx, has elements _d'ue-y) '^ dOidui O=0,u:=u}o for i = 1, . . . ,p and j = 1, . . . ,q. This result is proved in Chapter 4 as a special case of Theorem 4.2. 21 22 The qy~ q matrix F = 2A'{-L )A is the acceleration matrix. It is easily- obtained because A is a function of L^id; y), and -L is typically computed as part of ML estimation since it serves as an estimated asymptotic covariance matrix for the parameter estimates. Hence, computing the curvature in a particular direction is straightforward. In addition, the diagonal elements of the acceleration matrix are the q basic curvatures. Finally, it is well known how to obtain the maximum value of a quadratic form such as (2.1) (Johnson and Wichern 1992, pp. 66-68). Specifically, C^ax is the largest eigenvalue of F, and Umax is the corresponding eigenvector. Hence, finding Cmax and Umax is numerically feasible. The acceleration matrix can provide additional information. For example, the second largest eigenvalue is the largest curvature in a direction that is orthogonal to Umax- Hence, its corresponding eigenvector indicates a direction of large local influence that is unrelated to the first one. Additional elements of the spectral decomposition of F can be used in a similar fashion. A consequence of this derivation is that it shows how local influence techniques reduce the dimensionality of examining LD{u}) via a small number of curvatures and directional vectors. It is well known that the number of non-zero eigenvalues of a symmetric matrix such as F is equal to its rank. Since it is a product of other matrices, the rank of F is no more than the minimum of their ranks, which is typically p. In the regression examples given previously, p = 3 and g = 35, meaning that the local influence approach can summarize the 35-dimensional surface LD{uj) with three curvatures and directional vectors. Similarly, LD^^^uj) was summarized with just two, as shown in Figures 1.12(c) and (d). Despite the advantages of the eigen-analysis of F, one difficulty that arises is the occurence of eigenvalues with multiplicities greater than one. For example, if C'max = Ai has multiplicity greater than 1, then there is no unique Vmax to plot. 23 In these situations it may be more prudent to look at directed influence on single parameter estimates. 2.2 Local Influence Analysis of a Profile Likelihood Displacement The profile likelihood displacement, LD^^^\u}), was presented in Section 1.2 as a measure of influence directed through Oi. Cook (1986) showed that the acceleration matrix can be expressed as f''^ = -2A'(£'' - B22)A, (2.2) where -B22 — ^22 and L22 comes from the partition •'^■>V''^ - ■■'■■■' ■■.. . J J ^11 L12 L22 w where d'L{e;y) '' dOM. 9=0 This result is proved in Chapter 4 as a special case of Theorem 4.2. Beckman et al. (1987) considered directing influence on a single parameter estimate. They noted that in this case, the acceleration matrix is the outer product of a vector, and hence has only one non-zero eigenvalue. Without loss of generality, suppose that we wish to direct influence on the first parameter estimate in 0. The eigensystem is then given by A = 2||A'T| V oc A'T, (2.3) 24 where T is the vector for which TT' = {L - Baa)- It can be expressed as (2.4) where c = Ln — L12L22 L21. 2.3 Building Blocks of Local Influence for Regression In this section, the building blocks of F for a multiple linear regression are constructed. Subsequent sections apply these results to the response and variance perturbations used in Chapter 1. 2.3.1 Notation • Unless otherwise noted, the derivations assume the presence of an inter- cept in the model so that the residuals sum to zero. Centering and scaling the regressors tends to faciliate interpretation, but is not necessary. • The number of regressors is k — p — 1. • H = X{X'X)-^X. • Xi represents the i*^ column of X, and x\ represents the i*'' row of X. • r = (/„ — H)y represents the residuals. • Tj represents the residuals from fitting a model to the i*'' predictor using the other predictors. These can be considered as values of an adjusted predictor (i.e. adjusted for relationships with other predictors). Note: Hvi = Ti and (/ — H)ri = 0. • X(j) represents the X matrix with the i^^ column removed. Similarly, /S/^) is the regression parameter vector with the i*^ element removed, and A(i) is A without the i"* row. '"<. % * -.r-l 25 2.3.2 Acceleration Matrix The ML estimates of are the same as those of ordinary least squares, 3 = {X'Xy^X'y , and the ML estimate of a^ is ^. The p x p estimated asymptotic variance-covariance matrix of the MLEs is ' > L = a2 {x'xy^ 0' 251 n Partitioning A' into [A'^ A'^2], the acceleration matrix is given by F = 2A'(-£ )A Aa' = 2a^A'^ {XX'y Ap + K,A,2 lb 2.3.3 Acceleration Matrix for Directed Influence on a^ - • • The acceleration matrix for directed influence on a"^ will utilize the matrix L — B22) which is given by / 2,7'' This implies that the acceleration matrix for directed influence on cr is F^"'^ = -2A'(X"' - B22)A = — A;2A^2. n (2.5) Noting that A'^2 is in fact a 9 x 1 vector, there will be a single eigenvalue and eigenvector given by \ _ 4ff''|| A' vi oc A'^2 These quantities are the maximum curvature and its direction, respectively. 26 2.3.4 Acceleration Matrix for Directed Influence on /3 When directing influence on /3, we have L — B22 — -a^xx'y^ C The resulting acceleration matrix is f^^=2a'A'f,{XX')-'Ap. • (2.6) Note that this implies that directed local influence on ^ is equivalent to local influence for /9 when treating a^ as known and equal to its MLE. 2.3.5 Acceleration Matrix for Directed Influence on a Single Regression Parameter By applying the general results of Beckman et al. (1987), we can construct T (2.4) for directed influence on $1 in a regression model. Using results for the inverse of a partitioned matrix, TT' = -{L - B22) can be expressed as -(X(i)X(i)) X(i)Xi -XiX(i)(Xji)X(i)) ^ (2.7) where d = X'lXi - XiX(i)(X'(i)X(i)) ^X'^^^Xi. We denote {X'/^-.X (^i))~^ X'^^^^X i by 7^, as it is the regression parameter estimates from modeling Xi as a linear combination of the other explanatory variables. This simplifies the notation of (2.7) greatly: ir' - B22) = TT^, Fi 1 -t'i 7i 7i7i 0' (2.8) 27 Using these results, we find Fil 1 -7i (2.9) and this leads to the following expressions for the maximum curvature and its direction when directing influence on /5i : C^ax = 2||A'TI| = j^\\A'^, - A^^^^Till t^max oc A'T a A^^ - A^^^^t^. The next two sections apply these results to document more completely the two perturbation schemes given in Chapter 1. 2.4 Perturbations to the Response in Regression The results of this section closely follow those of Schwarzmann (1991), with only minor additions. As in Chapter 1, a set of additive perturbations are introduced into the observed data to examine the influence of measurement error in the response. An n x 1 vector of perturbations is introduced such that the perturbed likelihood is based on Y + a; ~ N{X/3, a^In)- The null perturbation is Wq = 0„. Equivalent results can be obtained by perturbing the model of the mean to be X/3 — (jj. That is, by considering a model assuming Y ~ N{XP — co, (T^In)- Hence, the measurement error perturbation can be interpreted as a type of mean-shift outlier model; each perturbation is a "stand-in" for an additional parameter that could correct or exacerbate lack-of-fit. 28 2.4.1 Building Blocks The perturbed likelihood is T? 1 K{e;y)oc--\oga'-^{y-X0 + ujy{y-Xl3 + uj) n 1 = -- loga^ - — (y + u>)'(y + w) n 1 = -- loga^ - ^(y'y + 2u;'y + w'a;), where y = y - X^. The next step is construction of A. Partial derivatives with respect to a> are as follows: ., dK{0;y) 1 du: 2(t2 1 (2y + 2u;) 1 (y + w) --(y-X/3 + a;). Second partial derivatives are then d'L^{0;y) 1 X %l Evaluating at the original model yields d'L^{e;y) du:dl3' d'L^e^y) 1 0=0,U}=OJo ^ divda^ From these quantities, the n x p matrix A' can be constructed: 1 A' = X 4^r ^ ' -- V 29 Finally, the acceleration matrix is ~ ' 1 F = 2— X I <t2 (XX')"' 0' 2£l n ^2 X' 1 ».' (2.10) 2 4 :^2 2.4.2 Directed Influence on a The acceleration matrix for the profile likelihood displacement for a^ is simply the second term of (2.10): = — ttTV . na^ (2.11) Solving the characteristic equations {F" - XI)v = directly, the single eigensolution is C/tnax — X max — '^max a-2 ^max 0^ • • Therefore, as mentioned in Chapter 1, Umax for directed influence on a'^ is proportional to the residuals, 2.4.3 Directed Influence on ^j Without loss of generality, consider directed influence on $i. Rather than find the acceleration matrix, we proceed directly to determining the sole eigenvector proportional to A'T and the corresponding curvature 2||A'T|p, as per (2.3) and i" r-' — - ■e?^> 30 (2.9): -X"! X(i) ri\ 1 -7i = ^ [^1 - ^(l)"^!] a||ri 1 •^Ikil a ri rn and a o" Fil r^i Therefore, the direction of maximum curvature for directed influence on $j is proportional to the f^ adjusted predictor. 2.4.3.1 Directed influence on /3 • The acceleration matrix for the profile likelihood displacement for $ is the first term of (2.10): M ^H. (2.12) This matrix has a single non-zero eigenvalue of ^ with multiplicity k, implying that there are an infinite number of directional vectors that have maximum curvature. Although not discussed by Schwarzmann (1991), one set of orthogonal solutions can be motivated by considering the canonical form of the model: Y ^Zoc + e, (2.13) 31 where Z = Xifi a = <fi'l3 ip = <Pi ■ • fk and (p-= the i*'* eigenvector of X'X. ' >. Because (pep' = Ik, this is simply a reparameterization of the original regression model. The new regressors, Zi = Xip^, ...,Zk = Xp^., are known as the principal components, and they are mutually orthogonal. In addition, each satisfies the characteristic equations oi F : ' {f'^-P)z. = ^Ah-i)x^, 2 ^' (2.14) = 0. Therefore, the standardized principal component regressors can serve as an orthogonal set of directional vectors, each of which has maximum curvature. Note also that the k adjusted regressors, ri, . . . , rjt, are also directions of majcimum curvature, but they do not constitute an orthogonal set. 2.4.3.2 Influence on 9 The acceleration matrix F given in (2.10) has two non-zero eigenvalues: -^ and j2, with the second one having multiplicity k. The first eigenvector is a r, and the set of k principal components can be used to complete an orthogonal set of eigenvectors. Comfirmation of these solutions is straightforward. ■ ' 2.5 Unequal Measurement Precision Applied to Regression Cook introduced this perturbation in his original (1986) paper on local influence, and the results in this section come from his work. An n x 1 vector of perturbations is introduced such that the perturbed likelihood is based upon 32 Y ~ N{X0,a'^D{l/u})), where D{l/u:) denotes a diagonal matrix with i*'' element ^. The null perturbation is a?o = In- The n basic perturbations would be the local versions of the n Cook's distances (Cook 1977; Cook 1979). In addition, the n basic perturbations applied to LD^^^u) are local influence analogs of the n DFBETAS for ^j. A second interpretation of this perturbation scheme comes from the idea of case- weights. Case- weights are used in an analysis when each observation has been sampled with unequal probability. Under disproportionate sampling, cases are weighted by the inverse of their sampling probability in order to obtain unbiased regression parameter estimates. In the normal linear model, this weighting is numerically equivalent to perturbing the error variances. Hence, this perturbation scheme can be used to examine sensitivity to the assumption of random sampling. 2.5.1 Building Blocks To begin, we develop the perturbed likelihood, which is ■n 1 U0;y) oc ~^\oga' - — (y - X^)'D(u;)(j/ - Xfi) = -^ log a^ - —y'D{uj)if. The next step is construction of A. Partial derivatives with respect to /3 are as follows: ^^"j^' ^^ = -^(-2X'D(u;)y + 2X'D{u:)Xfi) = ^X'D{u:)y, <t2' and the partial derivative with respect to a is dL^{0\y) n 1 ,, 33 Second partial derivatives are then and evaluating at the original model yields: d'LUe;y) dl3du}' 1 da^doj' 2^4^^'^^ 2^4 ^*«' 0=9,uj=u)o where • is the denotes the Hadamard (element- wise) product, implying that r,, is an n X 1 vector with elements rf. Combining these we can write 1 A' = D{r)X ^r 2ct2 ' sq and ■ F^2a^ ^D{r)X + 4a' n (XX')" T^X'Dir) (2.15) a2 D{r)HD{r) + 2na-' ''"^ 2.5.2 Directed Influence on a"^ The acceleration matrix for the profile likelihood displacement for a^ is simply the second term of (2.15): Solving the characteristic equations directly, the single eigensolution is (2.16) "^max ^ ^ sq 34 This suggests that o^ is most affected by variance perturbations when they are scaled according to the squared residuals. 2.5.3 Directed influence on yg,- Without loss of generality, consider directed influence on ^i. Again we will proceed directly to determining the sole eigenvector proportional to A'T and the corresponding curvature 2||A'T|p as per (2.3) and (2.9): Xi X(i) a 1 -7i a T\ = r>(r)ri (Xr -Ty (2.17) and t.'max — •^ 1 -n ^ 112 0„ 11^" ''I -r • Ti = zn ^ n r||2||ri||2- (2.18) This suggests that maximum local influence of the perturbation on $i is achieved by perturbing cases that have both large residuals and extreme values for the adjusted covariate. Cook (1986) provided additional details of how this result compares to the information in a detrended added- variable plot. Also, we may compare the information in Umax with that provided by the DEBET AS diagnostic (Belsley et al. 1980). Recall that the value of DEBETASij is the standardized diflference between $j and /3j,(i), the estimate when deleting the i*'' observation. The formula is given by DEBET ASij = ^ij'i ^Jv^) s\\cj\\{\-hii) 35 where r^ is the i*'' residual, Cj is the f" column of X{X'X)-\ Cij is the i^^ element of Cj, and s^ = ^^^ is the mean squared-error estimate of cr^. The vector Cj can be written in terms of the /'' adjusted regressor by partitioning (X'X)"^ as per (2.7). For i = 1, Ci = Xi X (1) ri 1 -7i '•i jri. Using this equality, the vector of n DEBET AS for /3j is proportional to / 1 \ i-ftii 1 i-ft 22 r • Ti (2.19) \ i-/i„„ / Since Umax oc r • rj, observations with large leverage stand out more in a plot of the n DEBET AS for /3j than in a plot of Vmax- This algebraic relationship between Vmax and DEBET AS has not been noted by previous authors. 2.5.4 Directed influence on ^ The acceleration matrix for directed influence on y3 is the first term of (2.15): 2 F^^ = [D{r)HD{r)]. The eigensystem of the matrix does not have a closed-form. However, some idea of the directed influence of this perturbation on ^ can be ascertained by utilizing basic perturbations. The i^^ basic perturbation corresponds to modifying the variance of only the i"* case, and its curvature is the i^^ diagonal element of F (Cook 1986): 2 (2.20) 1 0.75 0.5 0.25 •0.25 ■0.5 -0.75 •1 • J AT •!> • 2(f S" 30 ^ (a) V2 for 6 36 1 0.75 0.5 0.25 ■0.25 ■0.5 ■0.75 •1 * «» » ■ •n *-*t. 5 itl "IS 20 ••25 ' ^ 35 • * • • (b) vz for Figure 2.1: Hills data; variance perturbation This is similar to the i*^ Cook's Distance (Cook 1977): . • V 1 k{\ - hiifs 2.5.5 Influence on The eigensystem of F is not closed- form, but it can be found numerically. As an example, consider the perturbation of the error variances for the Scottish hills data. There are three non-zero eigenvalues for F: 14.82, 4.91, and 0.37. The eigenvector associated with the largest eigenvalue was plotted in Figure 1.10(a). It is similar to v^aax for directed influence on o-^, as given in Figure 1.10(b). The other two eigenvectors for F are plotted in Figure 2.1. : " 2.6 Benchmarks for Perturbation Analyses -• An analyst using local influence diagnostics will want to know when a curvature is "too big" and when an element of Umax is "too big." Calibration of local influence diagnostics was the topic of much debate in the discussion and rejoinder to Cook (1986), and there is still no consensus opinion on the subject. Much of the difficulty lies in the fact that the perturbation is introduced into the problem by the 37 analyst. Because the technique does not involve hypothesis testing, critical values are not appropriate. However, there have been attempts to develop rule-of-thumb benchmarks, as exist for Cook's Distance and other common influence diagnostics. In this section, we discuss some general difficulties in developing benchmarks for perturbation analyses, as well as some issues specific to the local influence approach. Although it is argued that all sensitivity is "relative", some simple methods for assessing the size of curvatures and the size of the elements of directional vectors are presented. , ,, 2.6.1 The Analyst's Dilemma . , ■,, '* As mentioned before, the perturbation is introduced by the analyst, and this implies that he/she must decide whether the effects of perturbation are important, much the way a researcher decides what effect size is important. Consider a situation in which the researcher is actually able to give the size or stochastic behavior of the perturbation. Here, we may concretely measure the effects of perturbation by whether or not the conclusion of a hypothesis test changes. Alternatively, the effects can be measured by whether or not the perturbed estimates are outside an unperturbed confidence ellipsoid. However, despite having a concrete measure of the perturbations's effects, assessing the practical relevance of the results still falls to the analyst. 2.6.2 Calibrating Curvatures in Local Influence In addition to the general limitations of a perturbation analysis, an analyst using the local influence approach must also contend with the fact that the behavior and size of the perturbations are undetermined apart from the basic formulation. Loynes (1986) and Beckman et al. (1987) noted that curvatures do not have the invariance properties needed for a catch-all method of benchmarking. This necessitates a reliance upon relative influence. • :. „ ; 38 2.6.2.1 1-to-l coordinate changes in fl Following Loynes (1986) and Beckman et al. (1987), let us consider 1-to-l coordinate changes in the perturbation space ft. That is, a new perturbation scheme is constructed with elements u* = k{LOi) ior i = 1, ... q, for a smooth function k. The new scheme's perturbation, a;*, will not yield the same curvature as uj. Specifically, Coi* = (^) _ C^y where uiig denotes the null perturbation value for a;, ^duii U=Ui, As an example, consider perturbing the variance of j/j to be 5? rather than rr- i • Here, k{uji) = uof, resulting in new curvatures that are (^l^i^i)^ = (2a;i)^.^i = 4 times the original ones. Hence, despite the fact that the two versions of fl can produce the same set of perturbed models, the curvature in identical directions will change. This implies that a universal benchmark for curvatures is not possible. Indeed, given the arbitrariness of scaling the perturbation, it should come £is no surprise that curvatures cannot be compared across perturbation schemes. However, the direction of maximum curvature is unchanged under 1-to-l coordinate changes in fl. ' ■ ,■ 2.6.2.2 Interpreting curvatures • % In this section, a simple way to assess curvatures from the same perturbation scheme is given. First, yC is a Taylor series approximation to LD{u)q + av) (Lawrance 1991; Escobar and Meeker 1992). Also, assuming that n is large, an asymptotic 100(1 — q)% likelihood-based confidence region for is given by values oi such that 2{L{9-y)-L{9-y))<xl,-^. Thus, if twice the curvature in direction v is larger than Xp.i-o? then a perturbation at a distance of one from Wq in direction v moves the parameter estimates to the edge of the asymptotic confidence region. " '.: ' • - ■ :*■ .^ 39 To further develop this idea, let us define the size of a perturbation to be its Euclidean distance from the null perturbation (i.e., ||aj — ci>o||). Suppose further that we have perturbations in two different directions: u;^ in direction va and ujb in direction ug. If |C^| is fc times larger than \Cb\, then perturbation wg must be s/k times larger than u;^ to have the same effect on the likelihood displacement. Equivalently, in order to perturb the estimates to the edge of the confidence ellipsoid, it would be necessary to travel \/k times farther in direction vg in ft as in direction v^. This provides a simple interpretation for the relative size of two curvatures. 2.6.2.3 Internal benchmarks for curvatures Building on the interpretation given above, curvatures can be benchmarked based on their relative size to some sort of "average" curvature for a given scheme and fitted model. For instance, the average of the q basic curvatures for the fitted model can serve as a benchmark. Alternatively, the average curvature could be computed by (1) allowing the squared elements of v to have a Dirichlet(ai . . .ag) distribution with all oij equal to one, and then (2) finding the expected curvature. This would be equivalent to finding the average approximate displacement at a radius of \/2 from Wq (Cook 1986). If such a benchmark is difficult to compute, a Monte Carlo estimate can be calculated. Because these benchmarks are computed using curvatures from the fitted model, they result in comparisons that are internal to the fitted model. The conformal normal curvature of Poon and Poon (1999) would also be an internal way to assess the sizes of curvatures. 2 .6.2 .4 External benchmarks for curvatures It is entirely possible that all of the curvatures of a fitted model are large due to the nature of the data set, and using internal benchmarks will not alert the researcher to such a phenomenon. To address this, a benchmark that is not dependent on the observed data is needed. To this end, we might utilize the expected value of the curvature in direction v, using in place of the unknown 40 parameters 0. The expected curvatures can then be plotted along with the observed curvatures (Cadigan and Farrell 1999). If the expectation of Cmax is difficult to obtain, Monte Carlo simulations could be used to estimate it. As an alternative, Schall and Dunne (1992) developed a means of external benchmarking by developing a scaled curvature using results on parameter orthogonality (Cox and Reid 1987). 2.6.2.5 Internal versus external benchmarks Internal and external benchmarks are complimentary tools, since each one provides different diagnostic information. Comparing curvatures from the same fitted model highlights relative sensitivity given the data and the design. Comparing curvatures to their expectations highlights whether the sensitivities are expected given the experimental or sampling design. 2.6.3 Calibration of Directional Vectors For a directional vector from the eigensystem of F, the analyst must compare the individual elements in the vector to determine which aspects of the perturbation are most important. Typically, it is suggested that the analyst simply look for elements that stand out as being large relative to the rest. Alternately, a vector whose elements are all equal in size can serve as a means of comparison. That is to say, if each of the q elements of the perturbation are contributing equally, then each is equal to either -i= or — -^. A particular element whose absolute value is larger than, say, 2 or 3 times that of equally contributing elements (e.c.e.) would then be of interest. Hence, the inclusion of horizontal lines at ±-^ and ±-^ in the plot of directional vectors may be useful. ■ 2.7 Local Assessment Under Reparameterizations ' Pan et al. (1997) show that Vmax is invariant to any 1-to-l measureable transformation of the parameter space 0. In fact, for = h{(f>), the entire eigensystem of F^ is the same as that of JFg. However, directed influence on the 41 estimates of the redefined parameters can then be examined. Using similar steps to Pan et al. (1997) and Escobar and Meeker (1992), we can write update formulas for A^ and L^ and then calculate directed influence on a subset of the redefined parameter estimates. Using partial diflFerentiation, the building blocks of the acceleration matrix under the parameterization </> can be written: de'.. de , (2.21) (2.22) where all partial derivatives are evaluated at the fitted model. From this, the acceleration matrix for directed influence on <p-^ can be obtained from the partition </> = (</>i,</»2)': ,[</>! -2A;(L ,00 -1 ^022 )^<^ = -2A where dc/) = -2A^(L, -^B,,,-)A„ (2.23) il^ ?>22 Zy^22 comes from the partition -t'011 L 4>12 ^4>2\ ^4>22 and all partial derivatives are again evaluated at the fitted model. Although Pan et al. (1997) suggested that (2.23) is invariant to the choice of 027 they did not provide a proof. To show this result, consider another 42 reparameterization for which the first pi elements are the same as those of (f>: V'l 01 xj,= = V'2 ^2 This implies that dtj)' Ipi d<f>2 9(p2 dxl>[ 9^2 (2.24) and this leads to the following expression for the lower right submatrix of L^: 8x1^2 dii)'2 d(f>' 89' ■■ 80 d(j> av'2 8e' .. 80 8(f> "50' d4>2 di>'2 (2.25) _ d^dO^i 00 8(p2 8'4)2d(j)2 ^8(1)2 811^2 8x1,2 '^^^ 8xp'2' where all partial derivatives are evaluated at the original model. Meanwhile, the acceleration matrix for directed influence on i/j^ is ,[1/-! -2a;(l; - = -2a;(l, )A, L-^ 1p22 80 8x1)' L -1 022 (2.26) 5^ 8x1) )Afi The invariance of the acceleration matrix for directed influence on </>i to the choice of 02 is proved by showing that (2.26) is equivalent to (2.23). Using the inverse of 43 (2.25), 80 dtj)' L-/.. 80' 80 8(f) 8x1) " a</)' 8^' . ° ^^i . d^d0[^ 8x1^ 8(j) [pi L-^l T M 'pi avi n ^ n ^02 r-l 902 9(9' L-^ 022 8(f>' (2.27) 50 _ 80 90 ~ 8(t>' _ 80 Substituting (2.27) into (2.26) completes the result. 2.7.1 Directed Influence on a Linear Combination of Parameter Estimates As mentioned in Escobar and Meeker (1992), a specific application of these results is to consider directed influence on a linear combination of parameter estimates, x0. There are many ways to reparameterize the model such that x'0 is one of the new parameters. For example, assuming that the first element of x is non-zero, </> can be defined as A0, where xi a;(i) Op_i Ip-i where Xi and a;(i) are the first and remaining elements of a;, respectively (Escobar and Meeker 1992). Using equation (2.23), the acceleration matrix for directed influence on ^i = x'0 can be computed, and the resulting diagnostics are invariant to the choice of (/>2, . . . , 0p. By using a prudent choice of new parameters, we can obtain easy-to-compute formulas for the single eigensolution of F , as per the following theorem. Theorem 2.1 The maximum curvature and its direction for directed influence on x'O is given by where ki = Lg x. Proof of Theorem 2.1 •• 1/2 To begin, consider a model with parameterization (j) — K'Lg 0, where K is a full rank p x p matrix with Ki K2 ■ ■ • Kr ki = Lg X >- Kkj= iy^j k[ki = k[ki i = 2. . .p. By construction, Using the update formulas in (2.21) and (2.22) it follows that A 9^' A iifciir "''■W' 45 Next, we show that directed influence on ^i = x'0 is not a function oi k2, ■ . ■ , kp. From equation (2.23), we find \ f''' = 2(, 1 -1/2 Ifell A'.Le' K) ||fci||^J-||fex| 0' I J/ (y^^'^^"'^^^^ ' :A'L-'^'k IfcllP " ' 1 0' K'T'^'Ae -^^A'eCk,k[L-"'Ae I'^i II -A'flLfl xx'La Afl. Ifel (2.28) The single eigensolution of this rank-one matrix yields the final solution for Cmax and Umax- □ , ■ The construction of is worth some note. The premultiplication of by •■ 1/2 K'Lg orthogonalizes the parameters, in the sense that the off-diagonal elements of the observed information matrix for </) are zero. (Note that the orthogonalization is with respect to the observed information matrix as in Tawn (1987), rather •• 1/2 than the expected information as in Cox and Reid (1987).) First, a = Lg is a parameterization with an observed information matrix equal to the identity matrix. Then, K is constructed to make orthogonal contrasts of a, with the first column corresponding to the linear combination of interest. Finally, by scaling the columns of K to all have the same length as fei, K becomes a scalar multiple of an orthonormal matrix, and this leads to the algebraic simplifications. Also, if x'6 = 9j the local influence diagnostics provided by the theorem are the same as those provided by (2.3), as per Beckman et al. (1987). 46 2.7.2 Application to Regression As an example, consider perturbing the variances of a regression model and examining influence on a linear combination of the regression parameter estimates, fi{x) = x'yS. From (2.28), we find ^ D{r)X{X'X)-^xx'{X'Xy^X'D{r), (2.29) where K = x'{X'X)~'^x . Further, v^^(xx'{X'X)-'^X'D{r)(x Cov[X'fi,x'p) ■ r (2.30) . C„,a, = 4-|l^'(^'^)''^'^('-)ir. -• " (2-31) where Cov(X^, x'yS) is the vector of empirical covariances between X0 and x'/9. 2.7.2.1 A DFFITS-type measure Using the variance perturbation and directed influence on x 0, a local influence analog of DFFITS can be defined. Recall that DFFITSj is defined as the externally studentized change in the i^^ fitted value when deleting the i*'* observation (Belsley et al. 1980): DFFITS. = ^^^1^^£^, where fl{xi)[i] is the estimate of //(xj) with the i*^ observation deleted and sh is the mean squared-error estimate of a^ with the i*'' observation deleted. A large value for DFFITS indicates that the observation is somewhat "self-predicting" . It can be shown that DFFITS, simplifies to 47 A similar measure can be obtained by examining local influence on p,{Xi) = a;-3 when using the i*^ basic variance perturbation. That is, only the i*'* observation is reweighted, and the curvature for directed influence on /i(cc,) is examined. The curvature for the i*'' basic perturbation is the i*'* diagonal element of Cft. = BasicFITSi = ^^. (2.32) A large value of BasicFITS indicates that the perturbation of the observation's precision/weighting greatly influences its fitted value. Thus it is a local influence analog of DFFITS. Coincidently, BasicFITSj is equivalent to the i^^ basic curvature for the acceleration matrix, as given in (2.20). That implies that for the variance perturbation, the n basic curvatures of LD{u>) are measuring influence on the n fitted values. While BasicFITSj is the f*'' basic curvature when directing influence on //(xj), we can also define MaxFITSi to be C^ax for directed influence on fi{xi). Using (2.31), . . 2 " C:nax. = MaxFITSj = T^ J] ^r-J. (2.33) By definition the, i*'' MaxFITS can be no smaller than the i^^ BasicFITS. While BasicFITS provides information about which observation's perturbation influences its fitted value, MaxFITS provides information about how sensitive each fitted value is relative to the others. • Example Consider examining influence on the fitted values for each of the 35 covariate profiles in the Scottish hills data set. Figure 2.2(a) displays the 35 BasicFITS (i.e., each of the basic curvatures for directed influence on the corresponding fi{xi)) . Race 7 is indicated as being influential on its fitted value, while race 18 is also shown 48 5 10 15 20 25 30 35 (a) BasicFITS • ' • • • 5 10 15 20 25 30 35 (b) MaxFITS I * ■' > t : '• ; I ,1 5 10 15 20 25 30 35 .: , .. (c) BasicFITS (stars connected by solid line) and MaxFITS (triangles connected by dashed line) Figure 2.2: Hills data; cr^D(l/u;) perturbation; influence on fitted values to be quite influential on its fitted value. Plot (b) shows the 35 MaxFITS (i.e. the C'max's for directed influence on each of the 35 fi{xi)). Since no point stands out as being large, the plot indicates that no fitted value is particularly sensitivety to variance perturbations. Plot (c) shows both sets of diagnostics. The BasicFITS values are stars joined by a solid line, while the MaxFITS values are triangles joined by a dashed line. The closeness of the MaxFITS and BasicFITS values for race 7 indicates that this observation is somewhat self-predicting. ..t.V- H l... d-' CHAPTERS PERTURBATIONS TO THE X MATRIX IN REGRESSION In this chapter, the influence that infinitesimal errors in the X matrix can have on parameter estimates in a linear regression is addressed. First, previous results for single column and single row perturbations are reviewed. This is followed by a generalization of these results to perturbing all elements in the X matrix. It is shown that the eigensolution of F is related to principal components regression, collinearity, and outliers. Finally, new graphical techniques are presented with the results applied to the Scottish hills data set and an educational outcomes data set. 3.1 Perturbations to Explanatory Variables Letting k denote the number of predictors. Cook (1986) proposed that an nA;-sized perturbation can be incorporated into the design matrix in an additive fashion. These perturbations represent fixed errors in the regressors and should not be confused with random errors-in- variables techniques (Fuller 1987). The perturbation scheme can be used to assess the effects of small rounding or truncation errors in the explanatory variables. Upon perturbation, the X matrix becomes X + UJi W„+l ^2n+l ■ • • ^kn-n+1 UJ2 ^n+2 ... ■ ■ ljJkn-n+2 Us ■ ■ ^kn-n+3 OOn i^2n W3n ■■ OJkn The use of a single subscript implies w = vec(W^), where the vec operator stacks the columns of the matrix into a single column. When particular elements of W greatly aflPect parameter estimation, it indicates that the corresponding element of X is 49 '^-^C 50 influential. This allows the analyst to identify specific combinations of explanatory variable values that influence estimation. As will be shown in Section 3.4, sensitivity of the regression parameter estimates to these kinds of perturbations is greater when there is coUinearity amongst the regressors and when the model is calibrated to fit the observed data too closely. 3.2 Perturbations to a Single Column The special case of only perturbing a single column of X was discussed in Cook (1986), Thomas and Cook (1989), and Cook and Weisberg (1991). This section recreates their results in detail. 3.2.1 Building Blocks Here and in the following sections, di represents a vector with a 1 in the «'*'' position and zeros elsewhere. The length of the vector depends on the context of its use. Note that for di of size k, d[{X'Xr'di = rr^ (3.1) 1 1 ' 1 1 1 and X{X'X)-'di = r-^ri, :' (3.2) Ir ill .. where r, contains the residuals from regressing Xi on the other columns of X. Without loss of generality, suppose perturbations are added to the n values of the first predictor (i.e., Xi becomes Xi +u}). The perturbed likelihood is n I K{9; y) oc -- loga^ - --{y -Xf3- Aa;)'(y -X/3- ^u;) - ^ ^^ (3.3) = ~ loga^ - —{y'y - 2My + ^V^). 51 The next step is the construction of A. Partial derivatives with respect to u) are as follows: Second partial derivatives are then d'LUe;y) ^ A d'LUe;y) 1 -(y-X(i)/3(i)-2A(A:i+a;)) X(i) duda^ — (y-X0-/3,u:). Evaluating at the original model yields .. du}d/3[ d^LU0;y) ^{r-^X^) d'K{0;y) 0=B,u:=u:o . ' duda"^ and from these quantities we construct the n x p matrix 1 A' = 1 (3.4) r - A^i -A-X:(i) rd[ - AX where di is a A; x 1 vector. It follows that the acceleration matrix is given by rd[-PiX -kr a^ {XX'y'^ 0' Ml n dir' - Z^iX' -^r' /T'* (3.5) J-^mrr'.PlH-Jl^(rr',^ry) 52 3.2.2 Directed Influence on a^ The acceleration matrix for the profile likelihood displacement for a^ is given by n = — jrr . (3.6) (3.7) Solving the characteristic equations {F - XI)v = directly, the single eigensolution is C„ 4/3? l^max 0^ ' • Therefore the direction of maximum curvature for directed influence on a'^ is always proportional to the residuals regardless of which column of X is perturbed, but the corresponding curvature is a dependent upon the effect size, \$i\. 3.2.3 Directed Influence on /9i Using the results of Beckman et al. (1987), the sole eigenvector is proportional to A'T and the corresponding curvature is 2||A'T|p as per (2.3): '^max CX . - r-AXi -AX (1) n 1 -7i 1 (^\\ri\ 1 cr||ri oc r — /3iri r-/3iXi+/5i7iXi r - /3iri , (3.8) 53 and '^max — mr,\\^ *9II 21 = l[^'t^ (3.9) o^ \ \\ri\\ The direction of maximum curvature is a weighted average of the residuals and the adjusted regressor. As noted in Cook and Weisberg (1991), ^iVi — y — y^i), where y(i) is the vector of fitted values when Xi is excluded from the model. Using this result, we find Vmax oc 2r — r^^, where r^j is the vector of residuals when modeling y using all regressors except Xi. This indicates that the direction is dominated by the residuals, a fact which is demonstrated by examples later in this chapter. Cook and Weisberg (1991) discussed how Umax relates to de-trended added variable plots. 3.2.4 Directed Influence on Other Regression Parameter Estimates Thomas and Cook (1989) examined the influence that perturbations in one column of X can have on the regression parameter estimate corresponding to a different column. Without loss of generality, suppose we perturb the first column of X and look at the effects on the last regression parameter estimate ^j.. Again using the results of Beckman et al. (1987) we can obtain the maximum curvature and its direction. Letting di be of size /: — 1, v^ax OC A'T = J, (rd[ - /3iX(fc)) -^X^ Fit = jrrrr; [-rd[% + kH(k)Xk - PiXk -Ik 1 (3.10) 54 where 7fci is the regression parameter estimate corresponding to Xi when predicting Xk with the other columns of the X matrix. The maximum curvature is given by 2 a Ikir' + ^ir'k lk\r + /3irfc 2 ^02 , -2 ll^l ■y, — /5i^ + 7.V (3.11) When the partial correlation between Xi and X}. is zero, Cmax achieves its minimum value of ^, yielding Vmax ex: r^. In the event that Xk is a linear combination of the other regressors with non-zero 7^1, Cmax = 00 and Umax ex: r. Between these two extemes, the direction of maximum curvature is a weighted average of the residuals and the adjusted k*^ regressor. These quantites are used to detect outliers and measure leverage, respectively. If the partial correlation between Xi and X^ is as large as that between Xi and y, then the two vectors are equally weighted. Note that this weighting is invariant to scale changes in the columns of the design matrix as well as the response. 3.2.4.1 Directed influence on Without loss of generality, we again consider perturbing the first column. The rank-A; acceleration matrix is F^' = — VlFilr/ Irilr (3.12) The first eigenvalue of this matrix is unique, and yields a maximum curvature and corresponding direction of Cmax = J2 [$i + ^) and viocr- ^^r^ . Hence, the maximum curvature and its direction are identical to that of directed influence on /5i. Confirmation of this first eigensolution follows: -(A'lkiir + ||r|ni(r--/3iri) = ||r||V - AlkllVi - ^f llnll^if n + /32||ri||V -/5il|n||V-||r||V + /3i^||ri||Vx+A||r-||Vi = -^i^||ri|plfri+/5f||ri||Vi ,, = 0. The second eigenvalue is ^/5f and has multiplicity A; — 1. As mentioned earlier, this implies that there is no unique set of orthogonal eigenvectors to complete the solution. 3.2.5 Influence on The acceleration matrix for examining both ^ and a^ simulatenously has k + 1 = p non-zero eigenvalues. The form of the eigensolution is unknown. My numerical studies indicate that there are three unique non-zero eigenvalues, one of which has multiplicity k - I. The one with higher multiplicity appears to be the second largest of the three, with a value of ^^l- Example Huynh (1982) and Fung and Kwan (1997) examined data from 20 randomly sampled schools from the Mid-Atlantic and New England States. The mean verbal test score (VSCORE) of each school's 6th graders is regressed on staflP salaries per pupil (SALARY), % of white-collar fathers (WCOLR), a socio-economic composite score (SES), mean teacher's verbal score (TSCORE), and mean mother's educational 56 Table 3.1: School data; Pearson correlation coefficients SALARY WCOLR SES TSCORE MOMED VSCORE SALARY 1.00 WCOLR 0.18 1.00 SES 0.22 0.82 1.00 TSCORE 0.50 0.05 0.18 1.00 MOMED 0.19 0.92 0.81 0.12 1.00 VSCORE 0.19 0.75 0.92 0.33 0.73 1.00 level (MOMED). Appendix A contains model fitting information and diagnostics for the regession analysis, with all variables centered and scaled. Diagnostic plots of the residuals, externally studentized residuals, Cook's Distances, diagonal elements of the hat matrix, and DEBET AS appear in Figure 3.1. Ordinary least squares regression indicates cases 3 and 18 are outliers, although Huynh (1982) also identified observation 11 as a mild outlier. This data set also has some redundancy among the predictors SES, WCOLR and MOMED (Table 3.1). Although the X matrix is not ill-conditioned, the association between SES and MOMED accounts for the unexpected negative slope estimate for MOMED. Table 3.2 shows the non-zero curvatures (eigenvalues) for perturbing single columns of the X matrix. Comparing the columns of the table provides information on which regressor's measurement is most influential. Here we see that the measurement of SES is typically the most influential, both for regression parameter estimates and for a"^. Comparing the rows provides information on which parameter estimates are most influenced by these kinds of perturbations. Here we see that WCOLR and MOMED estimates are the most sensitive. These are the two regressors that are highly correlated with each other and with SES. It comes as no surprise that the results for collinear explanatory variables are more susceptable to measurement errors of this nature. 57 • « M o.:s • • • • • • • . .i' • 10 • 15 20 ■u • • • ■o.s t • • 5 • 10 'IS 20 .A-M -*-■-■■ (a) Residuals (b) Externally stu- dentized residuals (c) Cook's Dis- tcinces t.t 1.3 t,i 0.1 .A. ■ 1.1 11 IS :o * -.,,^ ' ' ■ I ' '"« ' ' t. .'ft t ■ ■ • — • — »-» — * * ■ ■ — "-^ (d) Hat diagonals (e) DFBETAS for SALARY (f) DFBETAS for WCOLR 1.S 1 i U 15 'JO -r* — ■ I ■ ■ » VT * — I ' > . * — * — « » ■ ■ I (g) DFBETAS for SES (h) DFBETAS for TSCORE (i) DFBETAS for MOMED Figure 3.1: School data; diagnostic plots 58 Table 3.2: School data; curvatures using Xi + u perturbation Estimates Column being perturbed in LD{u}) ovLD^^'^iu)) SALARY WCOLR SES TSCORE MOMED e Ai=4.144 Ai=20.156 Ai=50.008 Ai=6.658 Ai=19.065 : A2=0.440 A2=0.847 A2=19.002 A2=1.413 A2=00.932 A3=0.094 A3=0.071 A3=14.441 A3=0.599 A3=00.091 $ Ai=3.357 Ai=18.534 Ai=26.445 Ai=4.432 Ai=17.292 A2=0.440 A2=0.847 A2=19.002 A2=1.413 A2=00.932 A 3.357 0.956 19.008 2.126 0.951 A 0.458 18.534 19.890 1.585 10.869 A 0.443 2.957 26.445 1.501 1.640 ?A 1.129 1.859 19.220 4.432 1.295 A A , 0.444 11.590 19.324 1.480 17.292 ^ 0.881 1.693 38.004 2.825 1.865 Since perturbing the SES variable is most influential, examining directional vectors for this perturbation is of interest. The element corresponding to case 18 is the most noticeable in the plot of Vmax for the likelihood displacement (Figure 3.2). This observation has a large residual and is indicated as being influential in the various diagnostic plots in Figure 3.1. Also note the similarity of Figure 3.2 to the residuals (Figure 3.1(a)). Plots (a)-(e) of Figure 3.3 show the directions of maximum curvature when directing influence on each of the individual regression parameter estimates. In general the plots indicate diff'erent elements of X3 to be influential on different $i. Finally, Figure 3.3(f) provides the direction of maximum curvature when directing influence on a"^. Recall that this vector is proportional to the residuals. 59 1 0.75 • 0.5 r 0.25 f • t • • • • 0.25 • • 5 • • • • 'ft . 2° •0.5 • 0.75 ■1 Figure 3.2: School data; X3 + u; perturbation; influence on 3.3 Perturbations to a Single Row The special case of only perturbing a single row of X was discussed in Thomas and Cook (1989) in generalized linear models. Here, detailed derivations for regression are presented. The results indicate that single row perturbations have limited diagnostic value. - . 3.3.1 Building Blocks Without loss of generality, suppose perturbations are added to the k regressor values of the first observation (i.e., x\ becomes x\ + a;'). For di being of order n, the perturbed likelihood is K[e- y) oc -^ loga^ - ^(2/ - X/3 - d,u:'P)'{y - X/3 - d.JH) k k = -^loga^ - -L(y'y - 2y,(^^,a;,) + (^^,a;,f). (3.13) j=i j=i Partial derivatives with respect to w are as follows: ^^ = -^(-2,.,.2(E/....)/3) (3.14) <•■ >.>«s 60 1 0.75 0.5 0.25 ■0.25 ■0.5 ■0.75 ■1 < . « 5 *10 * A 20 « • (a) Vmax for /3l 1 0.75 0.5 1 0.25 • ••••.,• ', •• 5 10 15 • •20 0.25 -0.5 • • . • • 0.75 •1 (b) Umax for /S2 1 0.75 0.5 0.25 •0.25 ■0.5 ■0.75 -1 • • • • . 'S. • V . " • 1 0.75 0.5 0.25 (c) Umax for ^3 ■0.25 -0.5 -0.75 ■1 • • • * 5 . • 10 i • ,A (d) Vmax for Pi 1 1 0.75 0,75 « 0.5 I 0.5 0.25 • .*•• • .••. t • 0.25 • •/ '. • 5 , V 15 V • • 5 ' • 10 ,'15 20 -0.25 • • ■0.25 • t • •0.5 • • ■0.5 I ••.75 ■0.75 4 (e) t^max for /Ss •1 (f) t^max for a^ Figure 3.3: School data; X^ + w perturbation; directed influence plots 61 Since the null perturbation is 0, evaluating at Wq simplifies the expression. We then proceed to take second partial derivatives: / d^L^{e;y) duid^i UJ=U}o LV ^ 2/1 - x[l3 -xu^ 1 d'L^{e;y) duda^ 2/1 - x[^ a^ ^. u;=u;o Evaluating at = and combining the portions for the regression parameter estimates we obtain . . d^L^{0;y) dud^' 1 ril - 0x[ dujda'^ ri i^' 0=0,U=LJo ^ where ri is the first residual. From these quantities we can construct the A; x p matrix A' rj-^x[ -^0 (3.15) The acceleration matrix is then given by F = 2 — a" [XX'Y^ 0' ^n + ^ h3/9 + r\{X'XY^ - ri^x[{X'X)-' - n{X'X)-'xi^' 2£l n ril -xi$ 62 3.3.2 Directed Influence on o^ The acceleration matrix for the profile likelihood displacement for a is given by -S3 ;;.4n 2(7 n -S3' \r\ no^ (3.16) /9/9 Solving the characteristic equations (F — \I)v directly, the single eigensolution is 4rf ^max = S||/3|| Un OC^. Notice how the curvature depends upon the residual for the case that is being perturbed, but the directional vector is always proportional to /9 regardless of which row of X is being perturbed. This suggests that Umax is not very useful for single row perturbations. The values of Cmax for each row can be compared to assess which row is most influential when perturbed. However, the relative effects only depend upon the residuals, indicating there is little value added by plotting the curvatures. 63 3.3.3 Directed Influence on /3j Without loss of generality, we can direct influence on 0^: Umax oc A'T oc ri Jfe - 0x[ 1 -7i Xn0 O'k-i - xnl3 - ri Ok-i / \ 7i n \ j + {xu - xn)0 -7i rilk-i \ ^-i.) 1 -7i + :rii/3 / and ""'"-T^llnl [1 -7l)' + (iu-a:n)3|r, where Xu is the predicted value of Xu from modeling Xi as a linear combination of the other regressors. More generally, the maximum local influence on $j by perturbing row i is -' '^max — fi ^max — /^ > I \^ij XijIP -yj (1 -V.)'+(£,^.-a:,,.)^||2. If the columns of X have all been standardized, then each of the elements of 7, and of /3 is less than 1. This suggests that element i of Vmax is likely to be larger than the other elements. However, the signs of r^ and {xij — Xij)$j can work to cancel each other out. Further interpretation is suppressed in lieu of the more general 64 Cnax 35 30 25 2«f 15 10 Si • • • 10 15 20 Row Figure 3.4: School data; maximum curvatures for single row perturbations perturbation presented in Section 3.4. Note that the curvature and directional vector change if one or more of the columns of X are rescaled. 3.3.4 Influence on fi and 6 The acceleration matrix for directed influence on ;9 is F^^^ = ^ [hiM' + rliX'X)-' - n0x[{X'X)-' - r,iX'X)-'xi0'j . (3.17) The eigensystem of this matrix is unknown, but my numerical examples indicate that it has k distinct non-zero eigenvalues. The eigensystem of F for the full set of parameter estimates is also unknown. 3.3.5 Example The values of school 18's regressors are most influential when perturbing rows of X one at a time, as evidenced by a large maximum curvature (Figure 3.4). For perturbing row 18, v^^x is nearly proportional to ^ (Figure 3.5). Note that although element 3 of Vmax appears quite large, it is actually less than the benchmark of — = — — 8Q4 65 Ir 0.75 0.5 0.25 (I -0.25 -0.5 ■0.75 ■1 Figure 3.5: School data; x[g + w', Umax for 6 3.4 Perturbations to the Entire Design Matrix At this point, the general case of perturbing all nk elements of the design matrix is considered. We use W to denote the perturbations in matrix form: W Ui Wn+1 ^2n+l ■ • • l^kn-n+1 a;2 '*^n-f2 ... ■ ■ ^kn-n+2 0^3 ... ■ ■ l^kn-n+3 Wn l^2n i^Sn ■■ OJkn and LOj to denote the f^ column of W. • . 3.4.1 Building Blocks The derivations in this section only require the presence of an intercept in the model (or centering of the response y). However, for ease of interpretation, we assume the columns of X have been centered and scaled. The perturbed likelihood is L^iO; y)^-^\oga'-^{y-X0- Wl3y{y -X0- W/3). (3.18) 66 Proceeding in a similar fashion to the derivations for perturbing a single column, partial derivatives with respect to u; are as follows: dLUff;y) _ 1 du: <t2 /3,{y-X0)-P^W0 f3k{y-X^)-/3kW^ Since the null perturbation is 0, evaluating at Wq simplifies the expression: dK{0-y) d(Aj U}=(jiJo I3i{y-X0) ^2(2/ - X0) Pk{y-X0) We then proceed to take second partial derivatives and evaluate at 0: (3.19) (3.20) d'LU9;y) dujd/3' 1 0=0o,(j>J=U}o ^^ 1 / LV r ... Or ... ... r \ ^2X \kx ) — {Ik®r-l3®X) d'K{0;y) d(jjda'^ 1 0=0o,U}=U}o ^'^ 02r a' ■(/9®r). 67 From these quantities we can construct the np x p matrix A', as given by Cook (1986): A' = ^ a^ Ik®r-P®X U^®r) (3.21) where ® denotes the Kronecker product (Searle 1982). The next step is to construct the acceleration matrix. Here we use the fact that {A ® B){C ® D) = AC ® BD, assuming comformabiHty of the matrices. Noting that A = A®\ = 1®A, the acceleration matrix can be written as F = — (jfc ® r - ^ ® X) {X'Xy^ (ik ® r' - /3' ® X') + -^ (^^' ® rr'^ = ^(^Ik®r-$®X^ ((X'X)-i ®r'-0'® (X'X)-^X') (^^'®rr'y ; (3.22) na^ 3.4.2 Directed Influence on q-^ The acceleration matrix for the profile likelihood displacement for &^ is the second term of (3.22): (3.23) This matrix has a single non-zero eigenvalue, ^||/3|p, with corresponding eigenvector oc ®r. This solution is confirmed below: {f"'^ - XI)v a i-M'®rr'-^\m'l na a'- i0®r) na^ na^ ^0 ®rr'-\m''\\rfl\00r) \m'0®\\r\\'r-0f\\r\\'0®r) (3.24) = 0. 68 This result implies maximum local influence on a^ is achieved by perturbations in each column that are proportional to the residuals scaled by the corresponding ^j. 3.4.3 Directed Influence on /3 In this section, we derive the full eigensystem of F , where •• Iff] The following theorem provides the full eigensystem of F for the X + W perturbation in a regression analysis. Note that eigenvalues in the theorem appeared in Cook (1986), but the eigenvectors appear here for the first time. Theorem 3.1 Recall that (pj is the j*'^ eigenvector of X'X and that X(pj is the f^ principal component. Letting Sj denote the j'^^ eigenvalue of X'X, the k non-zero eigenvalues and eigenvectors of F are \ = 2(t + ^) v^ ex ^fc_,.+i ® r - y3 X<p,_^+i, (3.25) Ofc-i-l-l o for j = l,...,k. Proof of Theorem 3.1 " ^ >", i^. Because the algebraic expressions are so long, the eigensolutions are confirmed in two steps. Noting that j-^ — and Vfc-j+i ^^"^ the /'* eigensolution of (X'X)~\ we 69 have 2 r ^ [(jfe ® r - ;9 ® X) ((X'X)-i ® r' - /3' O (X'Xj-^X')] (v?,. r) = ^[(/.®r-;3®x)(tE!<^,._o)]-A(H! + ||^||2)(^.^^) (3.26) 2 riM'2 ^U^lV'.^r-^^X-^,.) 2 Jlr C/ t/'j ■Pir(¥',-®r)-^(;9®X<^,.) The following matrix multipUcation yields the same expression as (3.26): F^^^-2(^ + *)/j(/3^X^,) = 4 [(jfe ® r - ^3 ® X) ((X'X)-i r' - 3' ® (X'X)-^X')] (3 O Xip^) = ^[(lfe®r--^®x)(0-||/3||Vj) 2 2 ll/3|r(-V,-®r + (/3®X<^^.) ^(JJ^ + ||^ir)(^®Xv.,) -||y3|p(VP,.®r)-\^(3®X<^,.) (3.27) Using (3.26) and (3.27), confirmation of the /" eigensolution is immediate: Ofc-j+1 cr (3.28) = 0. Finally, for j / j*, the orthogonality of the solutions follows: {(Pj®r-^®Xipj)'{ipj,®r-^^X(fij,) = 'p'j'Pj* ® r'r - 3V;* ® 'PjX'r - cp'.0 ® r'X<p^, + 0^ (8> if'jX'Xipj, = 0-0-0 + 3'/9® Vj-Vj* = 0. D 3.4.3.1 Interpreting Theorem 3.1 The closed- form eigesolution in Theorem 3.1 gives insight into how slight changes to the observed explanatory variables affect inference. To interpret the theorem, consider maximum curvature, as per (3.25): C„.ax = 2(^ + H!). (3.29) Ok (^^ The eigenvalues of X'X are traditionally used to assess the amount of collinearity in the regressors. If 5k is small, there is redundancy in the explanatory variables, and this causes the inversion of X'X to be unstable. If this inversion is unstable, so too are the regression parameter estimates, $ = {X'X)~^X'y. The first term in Cmax is large when 6k is small, giving further evidence that the estimates are more sensitive when collinearity is present. The second term of Cmax is also interesting. It can be considered a naive measure of goodness of fit; when the relationships between the explanatory variables and y are strong, ||)9|p is large and a^ is small. Hence, "better fitting" models are more sensitive to perturbation. However, ||)9|p/6-^ typically grows whenever additional variables are added to the model, indicating more complicated models are also more sensitive to pertubation. In fact, ||/3|p/(7^ bears some similarity to R"^: SSy y'y n((T2 + ||X^||2) 71 The quantity B? is the proportion of variance explained by the model, and serves as a measure of goodness-of-fit. However, B? can be made artifically large by adding additional explanatory variables, even when they contribute very little to the explanatory value of the model. The quantity ||/9||^/(7^ behaves similarly: additional variables decrease the denominator and usually increase the numerator. Of course, the scale of the regressors is a major factor in the size of \\0\\^, indicating that scaling the regressors would be prudent for interpretation. So, we may interpret Cmax in the following way. The vector of parameter estimates is more sensitive when (a) there are collinear variables, (b) there are a large number of explanatory variables, and (c) the model closely fits the data. We are not surprised to see that the first two phenomenon are sources of sensitivity. However, the suggestion that better-fitting models are more sensitive to measurement error may at first be disconcerting. When viewed in light of Type I and Type II errors, this is not the case. Recall that a Type I error occurs when a test is wrongly declared significant, while a Type II error occurs when a test is wrongly declared insignificant. Generally, a Type I error is considered the worse of the two. So, if there are strong relationships between X and y, the formula for Cmax implies this might be missed because of measurement error in the regressors. However, if there is a weak relationship, perturbations to X would not make the relationship appear strong. In this sense, the regression model is better protected against Type I errors (concluding there are strong relationships when there are none) than Type II errors (concluding there are weak relationships when in fact there are strong ones) when there is measurment error in the regressors. As a final comment on Cmax, the term W^W^/a^ may be better described as an indicator of a closely fitting model rather than a well fitting model. This suggests that if we construct complicated models to fit the nuances of a data set, the resulting 72 parameter estimates are more sensitive to rounding and measurement errors in the explanatory variables. Now let us consider Umax as per Theorem 3.1: Umax OC V'fc^r-^^OXyJfc. (3.30) •• r/?i This vector and the other eigenvectors of F provide detailed information about which elements of X most affect the stability of the MLEs. The expression for Umax shows how outUcrs and leverage in the final principal component regressor determine precisely what kinds of mismeasurements have large local influence on $. The directional vector can be used to scale W to achieve maximum local influence: yy max OC ifkir - ^iX(pk ipk2r - $2X(pk "■ ... (pkkV - PkXipk . (3.31) The elements in the first column are a weighted average of the residuals (r) and the regressor values of the last principal component regressor (Xy>^). The respective weights are the first original regressor's contribution to the final principal component {(Pki) and the negative of the first regression parameter estimate {—$i). Alternatively, we can think of the elements in the first row as being a weighted average of the explanatory variables' contributions to the final principal component {(p'^) and the negative of the parameter estimates {0 ). The respective weights are the first residual (ri) and the negative of the first observation's value for the final principal component {—x[(p^.). 3.4.3.2 A special case In order to examine in more detail how the perturbations given in (3.31) affect $, we consider a simple situation for which there are closed- forms for the perturbed parameter estimates. Let us assume that the response and a pair of regressors have been centered and scaled to have length one, and that the regressors 73 are orthogonal. The implications of these simplifications are as follows: y = \\Xi = X2I = 1 r'r = l-/5i^- -/?! A = x'lV $2 = x'^y 5i = 5r = l (p = I2 The first two eigenvalues of F are the same: and we may choose either of the following vectors for Vmax^ r - AXi -p2Xi -^1X2 r - /^2X2 Without loss of generality we choose the first one. Letting X^^ denote the perturbed design matrix, we have X^ = X+aW = X + a r - /3iXi -/92X1 (3.32) ar + (1 - aPi)Xi X2 - a$2Xi Xi^u -^2,0; for some real number a. Now, consider the effects of perturbation on /?i. Assuming a no-intercept model is still fit after perturbation, the perturbed estimate is given by r'i,uy Pl,.= \ri,u (3.33) 74 where ri,a, is the residuals from fitting Xi^^ as a function of X2,a;- The building blocks needed are as follows: J^2,(j — -^2,w(-X^2,t.;-^2,w) ^2,, 1 (X2-a/32Xi)(X'2-a/32Xl) 1 + a2/9| ri,^ = (/ - H2,<^)Xi,^ = (1 - a/3i - acp2)Xi + 0X2 + ar r[,^y = (1 - a/^i - ac$2)Pi + 0^2 + a(l - $1 - $1) |ri,J|2 = (1 - aA - ac/52)' + c^ + a'{l - Pf - 4^), where c = "^f't'l It follows that A ^ (1 - aA - QC;92)/gi + c^2 + q(1 - /3f - /gp ^'■" (1 - a$, - ac/92)2 + c2 + a2(l - pf - /5|) ' For the sake of comparison, we can also consider perturbing only the first column. Perturbation in the direction of Umax when directing effects on 0i results in the following perturbed X matrix: X^ = X + a*W* = X + a* r - /5iXi (3.34) a*r + {l-a*Pi)Xi X2 as per (3.8). Here, the perturbed estimate is given by >:..: ^ ^ (l-a*A)/?i+a*(l-/^?-4') (l-a*A)2 + a*2(l-/52_42)- In order for the two perturbations to be comparable, they should be equal in size. This is accomplished by stacking the columns of the perturbation matrices and making the resulting vectors equal length. In other words, the Froebenius norms of aW and a*W* should be equal. Letting a* adjustment. V^ provides the necessary Now we can plot ^x^^ for some selected values of /Sj and $2, considering values of a ranging from, say, — V^ to \/2. This eliminates perturbation matrices with Froebenius norms that are larger than that of the unperturbed X matrix. Figure 3.6 shows plots of ^i^ij for four sets of parameter estimates. The solid lines represents the parameter estimate in the direction Umax for the two-column perturbation, while the dashed line represents parameter estimates in the direction Vmax for the one-column perturbation. All of the influence curves in plots (a) and (b) have a value of .2 under the null perturbation, i.e. when a = 0. Likewise, the curves in plots (c) and (d) are all equal to .5 at the null perturbation. In all plots we see that the single column perturbation does not have the same effect as the dual column perturbation. The discrepancy between their effects is larger when /?2 is larger (plots (b) and (d)). 3.4.4 Influence in Principal Components Regression Consider the canonical form of the model, as given in Chapter 1: Y = Zoi + e, (3.35) where Z = X(p and a = (fi'/3. The following theorem proves that the direction of maximum curvature is the direction of maximumum directed influnce on Qjt. Theorem 3.2 Let a. denote the MLEs of oc for model (3.3b). The pairs Xj, Vj given in Theorem 3.1 are the maximum curvature and direction of maximum curvature for directed influence on cik-j+i, j = 1, . . . , fc. 76 Bl 1.5 *;: 1 ^ y^ '-~--,^^^ i 0.5 /*' "■■--.. •1 ..-■^■y 0.5 1 ^ -0.5 (a) Influence curves for /3i /32 = .4 — .2, (b) Influence curves for $\ P2 = .l = •2, (c) Influence curves for Pi 02 = A = .5, (d) Influence curves for Pi 02 = .7 = .5, Figure 3.6: Effects of single and double column perturbations on ^i. Solid line- double column; dashed line- column 1 only. 77 Proof of Theorem 3.2 We begin by obtaining update formulas for A and L as per Section 2.7: ■; '-' a'=a/^ ,- ■ ■■"?>••■ V. ■ .:- =A>^ = (^l^\X'X)<p) ^a''D{l/6). (3.36) (3.37) 78 Next, we find maximum directed influence on dfc-j+i using the results of Beckman et al. (1987): Vmax OC KTa,^j+i oc (fi<S)r — (S) X(p & \/*fc-j+i oc (fik-j+i ®r-P® Xipk^j^i Cmax — 2||AQTa^__^._^J ^. (vLi+1 ® r' - ^' (g) v5'fc_,.+iX')(<^fc_,+i O r - 3 ® Xcp^_^^,) -^r^— (1 ® r'r - \m' ^',_.^,X'X^,_^^,) ^=^<^fc- j+i =2(^.m 4-j /+i Since these expressions are the j*'' eigensolution of F \ the proof is complete. D This theorem gives added insight into the direction of maximum curvature. In short, v^^ is the direction that changes ay, the most. Similarly, the /'' largest eigenvalue and associated eigenvector specifically correspond to eflfects on afc-i+i- Also, the sizes of the curvatures show that the estimated coefficients for the most important principal component regressors are also the least sensitive. 79 Example 1 Consider the Scottish hills data set described in Chapter 1. The purpose of this example is twofold to demonstrate how the X-matrix results can be displayed graphically, and to demonstrate how collinearity affects the curvature values when examining influence on $. Figure 3.7 gives the direction of maximum curvature for directed influence on a^. The first 35 elements correspond to perturbations of the DISTANCE regressor values, while elements 36-70 correspond to perturbations of the CLIMB regressor values. The value for race 18's DISTANCE is the largest contributer to this influential perturbation. Elements 7 (DISTANCE for race 7) and 53 (CLIMB for race 18) are also large The information in Figure 3.7 can be difllicult to discern because the perturbation vector enters into the likelihood in the form of a matrix, W. Another way to display Umax is to first take its absolute value and then to consider the elements corresponding to each column of W. Each portion can then be plotted against indices 1, . . . ,n and shown together (Figure 3.8(a)). This graphical display provides a "profile plot" of each column of W's contribution to the direction of maximum curvature. Here we see again that elements corresponding to observations 18 and 7 as being the most influential. There is also some evidence that measurement error in the DISTANCE values (solid line) are more influential than error in the CLIMB values (dashed line). Figures 3.8(b) and 3.8(b) provide profile plots of the absolute value of the two directional vectors corresponding to Vi and V2 of F . The first plot draws attention to observations 7, 11, and 18, while no observations appear noteworthy in the second plot. At this point we turn our attention to the curvature values that accompany these directional vectors (Table 3.3). The first column shows the curvatures for local 80 Table 3.3: Hills data; curvatures using X + W perturbation. Model I contains DISTANCE and CLIMB. Model II contains DISTANCE and CLIMB-RATE Estimates in LD{u:) or LD^^'^iv) Model I Model II d^ Ai = 30.47 Ai = 28.52 Ai=21.15 Ai=16.41 A2=16.49 A2=16.25 influence when fitting the regression model for winning time as a linear combination of DISTANCE and CLIMB. However, as was evident in Figure 1.5(c), the two regressors are linearly related. An alternative model can be fit by replacing climb in feet with climb in feet per distance travelled, CLIMB-RATE . This regressor does not have the strong positive association with distance that the original coding of CLIMB has. This should reduce the coUinearity in the model, thereby reducing sensitivity to perturbations in the X matrix. Using standardized regressors, the new model is fit. Column 2 of Table 3.3 shows the curvature values for this new model. Cmax for directed influence on ^ has been substantially reduced, indicating that the MLEs for the second model are indeed less sensitive to perturbations in the X matrix. Figures 3.9(a), (b) and (c) provide the profile plots of the absolute value of the directional vectors that accompany the curvatures in Table 3.3, Column II. All three plots tend to indicate DISTANCE values as being more influential than CLIMB-RATE values, and all three plots show substantial change from those in Figure 3.8. The plot for a"^ now indicates the DISTANCE value for observations 7 and 18 as influential, rather than just 18. No elements stand out in Vmax, while case 11 's DISTANCE value is a very substantial contributer to t;2- 1 0.75 0.5 0.25 ■0.25 -0.5 •0.75 -1 * * *■' * -r* — • — ' — ' — ' — •— * — ' — ' ■•' — ' • ■ ; •'•■•• '^ » ' — ' — t — ■ * ' ■ ; ' *« * , ■ » ' • ' — 10 . . • 20 • .'30 . •4-0 -5^ • 69 *. TO ■..1,-* Figure 3.7: Hills data X + W pert.; Vmax for a^ 81 Example 2 We now consider applying the X + W perturbation to the school data and examining directed influence on 0. The purpose of this example is to demonstrate alternative methods for displaying Umax when the number of regressors and/or observations is large. Figure 3.10 is the plot of Umax for directed influence on 0. Elements 1-20 correspond to SALARY, elements 21-40 correspond to WCOLR, elements 41-60 correspond to SES, elements 61-80 correspond to TSCORE, and elements 81-100 correspond to MOMED. The two regressors that have little association with other explanatory variables, SALARY and TSCORE, have small elements. In contrast, some of the elements for the other regressors are more substantial. This plot does a fair job of highlighting regressors, but a poor job of highlighting observations. Figure 3.11 is the profile plot of |umax|- It is easy to see that cases 3 and 18 have multiple regressor values that are influential. In addition, one of the values for case 2 is influential. However, is to difficult to keep track of which line represents which column. Another option is to plot "profiles" of each row, as in Figure 3.12. Here, some of the elements of observations 2, 3, and 18 are highlighted. 82 (a) l^maxl for a^ 1 0.75 0.5 0.25 -0.25 •0.5 ■0.75 -1 5 10 15 20 25 30 35 (b) It^maxl for /9 (c) \V2\ for ^ Figure 3.8: Hills data; X ^W pert.; column profiles of directional vectors. DIS- TANCE values are connected by solid line. CLIMB values are connected by a dashed line. 83 (a) It^maxl for u 1 0.75 0.5 0.25 ■0.25 -0.5 -0.75( •1 5 10 15 20 25 30 35 (b) l^maxl for ;9 (c) \v2\i0xh Figure 3.9: Hills data; Model II; X + IF pert.; column profiles for directional vectors. DISTANCE values are connected by solid line. CLIMB-RATE values are connected by a dashed line. 84 0.4 0.2 . •• • 20 ' 40. •.60'^ '80 . • 100 0.2 ■0.4 Figure 3.10: School data; X + W pert.; Umax for directed influence on $ Figure 3.11: School data; X+W pert.; Column profiles of |umax| for directed influence on 0. Thick solid line- SALARY, thin solid line- WCOLR, dot-dashed line- SES, small-dashed line- TSCORE, large-dashed line- MOMED. 85 0.4 18 a 18 . 0.3 rK \ ^A / / y 0.2 // 0.1 \////^ ^^^^ ^^m 2 3 Figure 3.12: School data; X + W pert.; row profiles of \vj_ on ^ with large elements labeled. for directed influence A bubble plot of | WmaxI provides the best graphical display of the information in I'max (Figure 3.13). Each element is represented by a point plotted according to both the corresponding regressor (x-axis) and case (y-axis). Around each point is a bubble with radius equal to the absolute value of the element, and so large bubbles indicate large contributors to the direction of maximum curvature. Again, we see that the elements corresponding to Xi8,2, a;i8,5, 0:3,2, 2:3,5, and 0:2,3 have the largest influence, as evidenced by having the largest bubbles. 86 Row ,, .J . . 20 . '^ . ■ ® • i^v^*- * '. ■*■ ^• *.-u 'i- . o« ■ > o ® e ® • 15 • ■ • . • ^ • « 10 • • - • • • • • © 5 ® © ® • o • o • ® O ' ) ® © • 12 3^ 1 5 Column Figure 3.13: School data; X + W pert.; bubble plot of l^maxl for ^ CHAPTER 4 LOCAL ASSESSMENT FOR OPTIMAL ESTIMATING FUNCTIONS It is possible to examine the local influence of perturbations on measures other than the likelihood displacement, LD{u}), and the profile likelihood displacement, I,Z)[^i](a;). Several authors have applied local influence techniques to specific measures of influence. For instance, Lawrance (1988) and Wu and Luo (1993c) looked at influence on the Box-Cox transformation parameter. Also, influence has been examined on the residual sums of squares in regression (Wu and Luo 1993b), deviance residuals and confidence regions in GLMs (Thomas and Cook 1990; Thomas 1990), and goodness-of-link test statistics in GLMs (Lee and Zhao 1997). However, considerably less attention has been given to assessing local influence when estimation is not performed by ML. Some work has been done on extentions to Bayesian analysis (McCuUoch 1989; Lavine 1992; Pan et al. 1996), and Lp norm estimation (Schall and Gonin 1991). In this chapter, local influence techniques are extended in two ways. First, we consider the influence of perturbations on a general class of influence measures. Second, we address parameter estimates that are found via estimating equations. The intent is to unify nearly all previous work on this methodology and provide a conceptual basis and computational tools for future applications. The first three sections outline the inferential platform, a general class of influence measures, and assumptions about the perturbations. Next, techniques for local assessment of perturbations are discussed. In Section 4.4, we give some algebraic results for the two main diagnostic quantities: the gradient vector and the acceleration matrix. Section 4.5 contains computational formulas, while Sections 4.6-4.8 contain applications. Several measures, including LD{(jj) and LD^^^^u)), are considered in 87 : 88 section 4.6 using ML. Section 4.7 addresses influence on the generalized variance with an emphasis on ML. Finally, the last section considers quasi-likelihood. 4.1 Inferential Platform This section outlines assumptions about estimation, the nature of perturbations, and how the effects of perturbation are measured. 4.1.1 Estimating Functions Suppose that a statistical analysis is performed to estimate p parameters, 0. Further, suppose that point estimates, 0, are obtained as solutions to a set of p unbiased estimating equations, ^^' ■ ■ \ * h{e-y)=0, (4.1) where for each estimating function^ hi{0;y), ■ \ '■ * Eg{hi{9;y)) = i = l,...p. Sometimes these estimating equations arise from minimizing or maximizing some criterion with respect to B. Examples include least-squares normal equations, score equations and equations resulting from minimizing a Bayes risk. Quasi-likelihood (QL) equations (Wedderburn 1974) were originally motivated by maximizing a quasi-likelihood function, but the existence of the objective function is not necessary. Generalized estimating equations (GEEs) (Liang and Zeger 1986), like QL equations, merely require assumptions about first and second moments. The theory of estimating equations (EEs) dates back to early work by V. P. Godambe (Godambe 1960; Godambe 1976). There has been renewed interest in the subject due to new estimation techniques for semi-parametric models and models for clustered non-normal responses (Zeger and Liang 1986; Prentice 1988; McCullagh and Nelder 1989; Desmond 1997a). A considerable amount of work has centered 89 on obtaining optimal estimating functions (OEFs) (Godambe 1976; Crowder 1986; Crowder 1987; Godambe and Heyde 1987; Firth 1987; Godambe and Thompson 1989; Godambe 1991). To this end, the efficiency matrix of a set of p estimating functions is defined as Eff(/i) = (E dh/_ dG )-'W[h]{E dh 89' )-\ (4.2) Functions that minimize this non-negative definate matrix are considered optimal. Roughly speaking, this index of efficiency is small when the variance of the estimating functions is small and the average gradient is large. Provided that the estimating equations arise from maximizing or minimizing some function, we may interpret that large gradients indicate the function is not flat around its extremum. Assuming that a fully parameteric form for the data is assumed, the score functions are the unique OEFs up to a scalar multiple (Godambe 1960; Bhapkar 1972). However, with fewer assumptions about the data, it is generally not possible to find a unique set of optimal estimating functions. Instead, optimality is bestowed upon estimating functions within a certain class. One such class of estimating functions are those that are linear functions of the data. Under mild regularity conditions, the asymptotic variance of is in fact given by (4.2) (Morton 1981; McCullagh 1983; Crowder 1986). Thus, minimizing Eff(/i) is equivalent to minimizing the asymptotic variance of the parameter estimates. Note that the first interpretation of the efficiency matrix is appealing in that it is a finite sample index, but somewhat awkward in that it is based on a property of the estimating functions. The second interpretation is appealing in that it is based on a property of the estimates, but limited in that it requires asymptotics. For the purposes of applying local influence diagnostics we assume only that the estimating functions have been standardized. That is, given a set of unbiased estimating functions, h*{d\ y), we utilize the standardized estimating functions given 90 by hi0;y) = -E dh*' {W[h*])-'h\ (4.3) This standardization has three implications. First, \{h{0;y))~^ is equivalent to (4.2) and hence equivalent to the asymptotic variance of the parameter estimates. In addition, the standardized estimating functions are optimal in the following sense. Given a set of unbiased estimating functions, {/i-(0;y)}, the OEFs amongst the class of linear combinations of the h\ are given by (4.3). Hence, the standardized estimating functions are an optimal rescaling of the original estimating functions. Note that the negative sign is present in (4.3) so that the score equations, rather than the negative of the score equation, are considered standardized. Finally, standardized estimating functions are "information unbiased" (Lindsay 1982), meaning that the following equality holds: dh -E = E[hh'] = V[/i]. (4.4) d0\ Hence, V[^] = —E[dh/d0']. This fact is used later in the chapter to derive influence diagnostics for confidence ellipsoids. 4.1.2 Perturbations Perturbations are again introduced as a vector of q real numbers, a;. These perturbations may affect the data, the model and its assumptions, or perhaps even just the estimation technique itself. Examples of data perturbations are minor changes or rounding errors in the response and the explanatory variables. An example of model perturbation is the variance perturbation scheme used in Chapters 1 and 2. Finally, perturbing the contribution of each observation to the estimating functions by case-weights is an example of perturbing the estimation technique. This perturbation does not affect the distributional assumptions of the data, but rather how each observation is utilized for inference. Another perturbation to the 91 inference method would be to allow the value of the hyper-parameters in a Bayesian analysis to vary. Though some might argue that the hyper-parameters are part of the model, their values are often chosen for computational convenience. For the purposes of this chapter, we assume that the perturbation scheme results in a set of p perturbed estimating equations, ' K{0;y)=O, (4.5) -•V. - " \-" V ." which have solutions 0^. It is also assumed that there exists a null perturbation, Wq, whose estimating equations are identical to the original ones. Finally, it is assumed that there are no constraints on cj that would limit their values in a neighborhood of Wo- For example, constraining the space of variance perturbations to positive values is allowed. However, constraining the perturbations to sum to one is not allowed. These kinds of constraints could be accommodated, but are not considered here. 4.1.3 Influence Measures The final item to specify is a quantity whose sensitivity to perturbation is of interest. This influence measure serves as a barometer for how perturbation effects inference. The measure can depend on w directly and indirectly via some r x 1 vector of functions of the perturbed parameter estimates, 7(^0))- Some influence measures, m{u},'y{0^)), may be appropriate regardless of the method of estimation and the perturbation, while others may be more application-specific. Examples of rather general objective measures are the following: .., . mi =x'0^ (4.6) m, = -l^ (4.7) 92 where E^{yi) = fii^. None of these three measures depend on w directly. The first is a linear combination of the perturbed parameters. Here, ^{0^) = x'O^ is a scalar. The second measure is the ratio of the i*'' unperturbed parameter estimate to the z*'' perturbed estimate. Again, 7(^0;) = ^i„ is a scalar. The third measure resembles a Pearson goodness-of-fit statistic based on the perturbed estimates. Since it is not yet specified how the means, /ij^, depend on the estimates, we simply write 7(^0;) — 0^. The following measures are more application-specific, in that they depend on how inference is performed and perhaps even upon the perturbation itself: m, = LD{uj) = 2\L{e- y) - L{9^- y)] " (4.9) m5 = LD[^^Hu;) = 2[L(0i,^2;?/)-m.;,^(^iJ;y)] (4.10) m6 = LD*(a;) = 2[L,(0,;y)-L,(^;y)] (4.11) ms = log log(^^^) , f{y\e^y_ (4.12) (4.13) The first of these measures is the likelihood displacement, for which 7(^0,) = Ouj- The second is the profile likelihood displacement, which has 7(^0^)' = (^iw)5(^iw)')) where g{6iu) maximizes the profile likelihood of Bi^^ with respect to 62. The third measure is a likelihood displacement with a "moving frame" because the base likelihood of the measure is the perturbed one. Obviously, these three measures are intended for use when estimation is performed by maximum likelihood. The influence measure rrij is a Kullback-Leibler divergence between the densities f{y \ 9) and f{y \ OJ). The quantity is non-negative, achieving its minimum value of zero when the two densities are equivalent. Here, 7(^0;) = ^w The last measure addresses sensitivity of asymptotic standard errors. Recall that the determinant of a matrix equals the product of its eigenvalues, and when the matrix is a variance-covariance matrix for a vector of estimates 0^ the squared 93 eigenvalues are proportional to the lengths of the principal axes of a confidence ellipsoid for 6. Hence, the determinant of the variance provides a single measure of the confidence ellipsoid's size. Provided the estimating functions have been standardized, mg is the log of the estimated generalized asymptotic variance of 0. It is a function of a; both directly and indirectly via 7(^w) = Out- As a final note, it is not necessary for the influence measure to be a function of point estimators. In Bayesian analyses, perturbation can be measured by the Kullback-Leibler distance between the perturbed and unperturbed posterior densities. Such a measure would depend on u;, but not on any point estimates of 0. McCulloch (1989) developed a general local influence technique for assessing perturbation of the prior distribution in a Bayesian analysis. Other types of perturbation analysis for Bayesian models are found in Kass et al. (1989), Geisser (1991), Geisser (1993), and Weiss (1996). 4.2 Local Assessment of Perturbations At this point we have set the stage for our sensitivity analysis: a perturbation is introduced into the inference, and a measure of the effects is chosen. The next step is to obtain some useful diagnostics. This section decribes how a Taylor series expansion of m and some differential geometry results can be used to describe the function. 4.2.1 First-order Local Assessment As in Chapter 1, let ijiJ.u{a) = ujq + av he a vector in the space of perturbations. The unit-length vector v determines the direction from Wq and the value a determines the distance from a>o in fi. The value of m for a perturbation of length a in direction v can be approximated by a first-order Taylor-series expansion 94 around the null perturbation: m(a) « m(0) + -^(a - 0) = m{0)+ap-v (4.14) = m(0) + arh'v, where the partial derivatives are evaluated at a = 0. The quantity rh'v has a geometric interpretation. Consider the influence graph of m (i.e., the influence function described as a surface in 3f?'"^^). For each direction v, there is an associated lifted line on the influence graph. The slope of this "slice" of the graph is given by S = rh'v, (4.15) and it is sometimes called the directional derivative. The direction with the largest absolute slope is proportional to m, the gradient vector. Both the directional derivative and the gradient vector have diagnostic value. To explain, recall from Chapter 2 that we may use a perturbation's Euclidean distance from the null perturbation, ||ti; — a;o||, as a measure of its size. Further, let us define the effect of the perturbation as |m(o) — m(0)|. If m(0) = 0, the effect of a perturbation is the absolute value of m{a). The directional derivative can then be interpreted as the approximate eff"ect of a size-1 perturbation in direction v. Furthermore, suppose that we have perturbations in two different directions: u;^ in direction v^ and Wg in direction ug. If 1 5^ | is A; times larger than j^sl, then perturbation Wg must be k times larger than a>^ to have the same eflFect. Therefore, directional derivatives are measures of the relative influence of perturbations. Additionally, the gradient vector shows how perturbations can be scaled to achieve maximum local influence. That is, Wq ± (m/||m||) has the largest approximate effect, ||m|p, among perturbations of size 1. 95 If m(0) is a local minimum, maximum or saddlepoint of the influence graph, then the gradient is zero. Here, a first-order local assessment does not provide any diagnostic information. Instead, we can turn to a second-order approximation of the influence graph. 4.2.2 Second-order Local Assessment A second-order Taylor series expansion of m is given by m(a) « m(0) + — (a - 0) + -^^(a - 0)2 ,^, dm a^ , dm^ /^ i«\ a^ - = m(0) -I- arh'v + —v'Mv, ,. where the derivatives are again evaluated at a = 0. The quantity A = v'Mv also has a geometric interpretation: it is the rate of change in the slope of the lifted line associated with direction v, i.e. it is the acceleration of the curve. The matrix M is the acceleration matrix, and the eigenvector associated with its largest absolute eigenvector is the direction of maximum absolute acceleration. If the gradient vanishes, the second-order approximation simplifies to: m{a) = m(0) + —v'Mv. (4.17) Li Both the acceleration and the acceleration matrix have diagnostic value, particulary when the gradient vanishes. In this case, A can be interpreted as the approximate eff'ect of a size-\/2 perturbation in direction v. In addition, suppose that we have perturbations in two different directions: u;^ and wg. If \Aji\ is k times larger than \As\, then perturbation U3s must be \/k times larger than w^ to have the same effect. Finally, the eigensystem of the acceleration matrix can be used to identify influential perturbations. The largest absolute eigenvector of M is the maximum approximate eff'ect of a size-v/2 perturbation, and the associated 96 eigenvector is the direction of maximum approximate effect. Other eigensolutions can also be used. 4.3 Acceleration, Curvature and Normal Curvature Two more geometric descriptions of the influence measure can be calculated: curvatures and normal curvatures. These quantities are functions of both the gradient vector and the acceleration matrix. When the gradient vanishes, the curvature, normal curvature, and the absolute value of the acceleration associated with a directional vector are all equivalent. From (1.5) the curvature associated with direction v at the null perturbation is i d^m.{a) I c da'^ Using (4.16), it follows that \v'Mv\ (4.18) a=0 c = (1 + v'rnrh'vYI'^ ■■■ ' , . (4.19) " (i;'(/g + mm»3/2- If the slope is zero, then C = |v4|. As mentioned previously, curvature is a measure of how quickly a curve turns. The notion behind normal curvature is a bit more complicated. The vector (u;„(a)', 0)' in W~^^ is projected onto the tangent plane of the influence graph rather than the graph itself. A plane is then created that contains both the projection as well as the vector that is orthogonal to the plane, the principal normal vector. This plane then intersects the surface to create a normal section. Using the tangent plane as the basis, curvature of the normal section can be computed as if it were a simple plane curve. The resulting normal curvature is a description of how the influence graph deviates from its tangent plane. The normal curvature associated 97 with direction v at the null perturbation is (Cook 1986) (l + ||m||2)V2t;'(J, + mm')r' , ^ " ^ If the gradient is zero, then NC = C = \A\. The two geometric descriptions NC and C can be distinquished as follows. Curvature is the reciprocal of the radius of the best fitting circle to the lifted line associated with v, while normal curvature is the reciprocal of the radius of best fitting circle to the normal section associated with v. If the gradient is zero, lifted lines are normal sections, implying that NC = C = \A\. Additional insights are found in Cook (1986), Wu and Luo (1993a), Wu and Luo (1993b), and Fung and Kwan (1997). , , ^ 4.3.1 How to Use Local Assessment of Perturbations Local assessment of perturbations can be used to (1) assess the influence of aspects of the statistical problem on inferences, and (2) compare the relative sensitivity of parameter estimates. First, we define the local influence associated with a length-one vector v to be: {|m'v| if m 7^ (4.21) \v'Mv\ if m = 0. Thus, li{v) is simply the directional derivative unless the gradient vanishes, in which case it is the normal curvature. The direction of maximum local influence, Vmax, is the main diagnostic tool, and it can be displayed graphically using an index plot or star plot to informally assess the influence of perturbations. Additionally, the local influences of basic perturbations can be plotted against case indices to compare the effects of one- at-a-time perturbations. ; :. 98 The relative sensitivity of p parameter estimates can be assessed by plotting the p maximum local influences for rrii = Oi^, . . . ,mp = 9p^. Measures with the largest maximum local influences are most sensitive to the perturbation. Two cautions with this technique should be noted. First, the metric of the parameter estimates ought to be comparable. If the metrics are not comparable, standardized parameter estimates should be used. The second caution pertains to first- and second-order influence. Typically, the set of p measures are assessed by a first-order approximation. However, if a subset of the measures requires second-order assessment, then the following revised definition can be used to make the local influences comparable: ■ f,'^ ■'■'*•- ■■'■ ■ I \jh'v\ if m 7^0 li*iv) = < (4.22) .,; I \^v'Mv\ if m = 0. This ensures that all influence measures are compared by the approximate effects of length-one perturbations. . , 4.4 Algebraic Expressions for Local Assessment In this section, algebraic expressions for local influence diagnostics are derived. 4.4.1 First-order Assessment Theorem 4.1 The directional derivative in direction v of influence measure m{uj, ^{9^)) at the null perturbation is given by: S = {sl + s'^KJ)v, (4.23) 99 where dm{oj,'y{e)) K d(jj «7 = dm{uj,-f{9J)) di{e^) 89' 89^ duj' and |o denotes evaluation at the unperturbed fitted model: u> = cjq and 0^ = 0. Proof of Theorem 4.1 Taking partial derivatives of m we find dm dm duj da duj' da 'dm{u:,'r{0)) dijj' dm{u},'y{0)) duj' ^ dm{i^,j{0^))d-f{0^) ^^ dm{LJ,j{0^))d'y{0^) d0^ , e=L d-r{0J d0^ 9u}' Evaluating at the original model completes the proof. D Corollary 4.1 The gradient is given by m = Sj^ + J'K's~f. (4.24) 4.4.2 Second-order Assessment In order to make a general case expression for the acceleration, a large number of partial derivatives are needed. The following lemma provides an identity used to derive the algebraic expression for M. Lemma 4.1 Define the following: Ti d(jjd(jj' Gi dHiO) d0d0' „ _ d'L dwdijj' Then, T=[lr^ J'] GJ+[K® Ig] P, 100 for rqxq r2 G rpxp G2 r ^ :* Pi Ppqxq = P2 .^p. and where J and K were defined in Theorem 4.1. Proof of Lemma 4.1 Consider: dijjdu}' dui _d_ 89, do', duj' du: 7 + E d-YiiO^) j=i dOj^ du: + de... Evaluating this equality at the null perturbation yields Ti — J GiJ < Stacking the Fj to form F completes the proof: J'GiJ j'GrJ + [Ir ® J'] GJ + di2{L}T dy2{L) T :rr J-n ttx ■'■n ae 962^ = [Ir ® J'] GJ + [K® Jg] p. n dwduj' dwdu' J (4.25) 101 Theorem 4.2 The acceleration ofinSuence measure m{u}, 7(^w)) in direction v at the null pertur- bation is given by A dM^,7{0^)) = V da? ,d^m{u,MO.)) duiduj' V -h :- = v'Mv (4.26) = V Ig J'K' -*9 KJ * + [s; ® J'] GJ + [s'^K <^Ig]P\ V, where dwdu' l0 dud-yiey I" d"i{e)duj' I0 d'y(6)d'y(0)< I0 (4.27) Aiternativeiy, we may write ^ = 1;' = t; J, j'a:' I, J'K' -*9 KJ + Y^8^Ti V i=l \ (4.28) + j'[5]s^,G,]j + ^(s;ii:,)p, j=i i=l V, / where s^. is the i*^ element of s^ and Ki is the i*^ column of K. Proof of Theorem 4.2 First we define the following {q + r) x 1 vector: / S = \ lie.) 102 Then, 9^m(a;,7(gJ) ■: da' d_ da d_ da d_ da dm{8) d(jj duj' da dm{6) duj' dm{5) du' V v + dm{6) dv du}' da =^ V = V = V = V = V da dojdu;' ,d^m{S) v + dudu} d_ d(jj _a_ du3 V. dm[5) dS dS' du:' V. dmjS) dS' dS g+r fi,.,' "^ 2^ du3 dS' d^mjS) dS duj dSdd' duy 'dS'd'^miS) dS i=l g+r i=l r dm{S) d^Sj d6i dijjdijj' dm{6) d^Sj dSi duidu' V. V. , V^ dmjS) d'jije^) (4.29) aw dSdS' du;' -^ 57i(^^) dupdu' where the last simplification occurs because d'^cui/divdu}' = 0. Examining the first term of d'^Tn{S)/du}du}', and evaluating at the null perturbation we find dS' d^m{6) d6 du dddS' dui' dui dui I, J'K' d^mjS) dSdS' Jo dw dui' du' J dujdw' l0 dwd-y(eY l^ di(9)du)' lo d'y(e)d'r{eY 1° (4.30) 103 Examining the second term of d^m{6)/du:du}' and evaluating at the null perturbation, we find dmjS) g^7i(^J E = [s; O /,] ([/, ® J'] GJ + [K^ Ig] P) = [s; J'] GJ + [s'^K ® /,] P (4.31) by Lemma 4.1. D 4.4.3 Comments ' While the algebraic expression for 5 is quite simple, the expression for A is somewhat cumbersome. However, a second-order assessment would typically be used only when the gradient vector vanishes. In most situations then, s-y = 0, and the acceleration matrix simplifies to M = I, J'K' ■*9 KJ (4.32) The algebraic expressions are helpful in that they decompose the diagnostics into building blocks, most of which are immediately computable. Specifically, the quantities 8^, s^, S, G, and K should be immediate upon choosing an infiuence measure and fitting the model. In the next section, it is shown that the matrices J and P are also easily obtained in most situations. 4.5 Computational Formulas for J and P In order for the formula for the gradient and acceleration matrix to be of practical use, they must be easily computed in a wide range of applications. In this section, it is shown that the matrices J and P can be computed even when the unperturbed parameter estimates are not closed-form. The only requirement is that the perturbed estimating functions be closed- form expressions of a; . 104 4.5.0.1 Computation of J The following theorem generalizes equality (14) in Cook (1986). Theorem 4.3 Let Oij be the solutions to the p perturbed estimating equations hi^{0; y) = 0. Then, J=-h~^A, - (4.33) where dK{e;y) A = duo' Proof of Theorem 4.3 By assumption, we have the following equality Taking derivatives with respect to cj' on both sides we find (4.34) dK{e-y) die' where is now a. p x q matrix. Evaluating both sides at the original model we find A + hJ = 0, and the theorem follows directly. D Theorem 4.3 has two implications on computation: the gradient vector is easy to calculate, and the acceleration matrix is easy to calculate provided s^ = 0. Neither h nor A require that the solutions to the estimating equations be closed-form for either the perturbed or the unperturbed inferences. In addition, the matrix h is often computed as part of the estimation process. Not only is it a by-product of using the Newton-Raphson root-finding technique to solve the estimating equations, but it serves as an estimate of the asymptotic variance of 0, the inverse of (4.4). When estimation is performed by ML, -h~^ is the inverse of 105 the observed information matrix. The matrix A is also easily computed provided that the expressions for the perturbed estimating functions are known. This computation was demonstrated many times in Chapters 1-3 for maximum likelihood. Computation of A could be difficult when the estimating equations themselves are not closed-form. An example would be ML inference for generalized linear mixed models (McCulloch 1997). 4.5.1 Computation of P In this section, a formula for P is given so that it, like J, it can be computed without re-estimating the model. To begin, we present the following lemma, which provides an identity used to construct P. Lemma 4.2 Let 0^ be the solutions to the p perturbed estimating equations hcj{9; y) = 0. where -h'i^Ig P = I, J' *^ I, J l,...p, hi = the i*^ column of h *i Qi Ri (4.36) Qi d'hUe;y) 9u;9a;' Ri = d^hiU0;y) dujdO' hi = d'hiU0;y) ■''^' dOdO' .-■•' ■'• lif J = -h A. 106 Proof of Lemma 4.2 Taking the z*'* row of (4.35) in the proof of Theorem 4.3 gives the following equality: dhi^{e\y) d(jj' dhi^{e^;y)d0^ 0'. e^e^ del du^ where 0' is a 1 x g matrix. Taking derivatives of the first term on the left side of this equality with respect to a;, we find (4.37) du: dhi^{e;y) du}' 0=0^ d^hU0;y) dijjduj' d0^d'hU0;y) 0=0 ^^ d0dij}' 0=0^ (4.38) Taking partial derivatives of the second term with respect to u;, we find _9_ d(jj dhi^{0^;y)d0u 80',, 9u:' d'hi^{0-y) du:d0' + d0^d'hiUO;y) du} 0080' 9=0^ 8uj' 8hi^{0^;y) 89 j (4.39) 8u}8u}' Using (4.38) and (4.39), the next step is to take derivatives of the equality in (4.37) with respect to oj and then evaluate the result at the null perturbation. It follows that d_ 8(jj 8hi^{0;y) 8U3' dhi^{0^;y)80a 0=0^ 80, 8u}' d'hUO;y) dujduj' + d0^d^hUe\y) Q^Q^ 8u} 808(jj' + 8^hi,{0;y) 8lj80' 0=0 Jo 80^ , 80,8%^{0-y) Q^Q 8u}' 8uj 8080' 0=0. 00^ 8U3' E j=\ dhi^{0^;y) 8di, 89, to, dujduj',^ (4.40) = Qi + J'R'i + RiJ + J'hiJ + Y^ dhiu{0;y) I J' Qi Ri Hi hi + where is now a, p x p matrix. The lemma follows immediately. D I J 89, h: ® I, 107 The theorem below now provides a formula for P that does not require re-estimation of the model. Theorem 4.4 *^ ^ s : Let 0^ he the solutions to the p perturbed estimating equations hu{0; y) = 0. Then, P = -1 h ®[la J'] * / J (4.41) where * = *i *. (4.42) Proof of Theorem 4.4 Taking the p sets of equalities of the form given in Lemma 4.2, we find ., -hi (8) Jq hp ® Ig Manipulation of the left side of the equality proceeds as: Ig J' *1 I, J • /, J' J (4.43) -^'l ® Iq h\\Iq hi2lq • p = h^llq h22lq -h'p (2) Ig = -h®Ig P. 108 Manipulation of the right side of the equality proceeds as: I, J' \ *1 I, J r lo J' *1 r 1 \ L J h r -] r -| L J J I, J' *p J - h J' [I, J'] [la J'] Ip^ila J'] * It then follows from (4.43) that h^Ir, h ®In "W Ip^lla J'] * * 1 ^7 Ip^ I, J' * J J J'] * I, . n -1 J Corollary 4.2 Pi = Y^i-h'^) p Ig J' *. = Yi-h''){Qj + RjJ + J'R'j + J'hjJ) 109 4.5. 1.1 Remarks The formula for P can be combined with the results of Theorem 4.2 to further revise the acceleration matrix: M = dujduj' I J'K' I J'K' I KJ I KJ + [s; ® J'] GJ + [s'^K ® /,] P + [s; ® J'] GJ s'^K ® Jj h~ ® [ J^ J'] * I J I J'K' I KJ + [s; ® J'] GJ + -s'^Kh ®[T J' * I J Also, we may write p P 1=1 p p /, J' *. ^. J = EEK^^)(-^'') j=l i=l /, J' *. -*9 J = Y^{s'^K{-h-;)) j=i /, J' *, (4.44) (4.45) Hence, another form of the acceleration is given by M I J'K' I KJ ^J'[Y.s„G,]J i=\ i^^Kh, ') I. J' *i I, i=i L J J Note that in all computations, J = —h A as per Theorem 4.3. 110 (4.46) 4.6 Local Assessment for Maximum Likelihood Inference In this section some brief examples when estimation is performed by ML are given. The influence measures addressed include linear combinations of parameter estimates, ratios of parameter estimates, the likelihood displacement, the profile likelihood displacement, and the Kullback-Leibler divergence. Emphasis is placed on constructing the building blocks of local influence. From Theorem 4.3, J = —L A. The individual measures determine 7(^0;) and the remaining building blocks: dmiiuniO)) K dm{u},'y{e^)) 36' S = dj{e^) 4.6.1 Linear Combinations Consider a linear combination of the parameter estimates under perturbation: ^ti>.: m = xOui. The building blocks of first-order local assessment are: 7(00,)= x'K s^= K= x' 8r^ = 1. Ill Construction of the gradient and directional derivative is immediate: S^rh'v — -x'L Av. Note that the direction of maximum absolute slope is proportional to m oc A'L x. This is exactly the same as the direction of largest curvature for the profile likelihood displacement of x'O^j under the orthogonal parameterization given in Theorem 2.1. y* ,. J, ■>. ■ \ 4.6.2 Ratio of Estimates i« • VV ^' "^^ For the measure m = Oi/Oi^^, the building blocks for a first-order local assessment are where dj is a p x 1 vector with a 1 in the i"* position and zeroes elsewhere. Hence, S = rh'v = (si + s'KJ)v = Id'iL'^Av = IV'Av, " i +1, " — 1 where L is the r"' column of L . 4.6.3 Likelihood Displacement Recall that the likelihood displacement is given by m(a;,7(^J) = 2[L(^;i/)-L(0<,;2/)]. This influence measure requires a second-order local assessment, as is now shown. First, it does not depend on w explictly, so 8^^ vanishes. Since the exact form of the 112 likelihood has not been specified, we simply write 7(^cj) = 9^. It follows that am(a;,7(^J) dl{9.) dL{e^-y) = -2 56>, = -2 dL{e-y) 80 0. Hence, the gradient vector is zero. Proceeding with a second order assessment, we first note that K = Ip. It follows that M = I, J'K' I J'K' I -A'L I -A'L -1 ■^9 KJ I KJ -21/ + [s; ® J'] G J + [s'^K ® Jj P L A -L A = A'L \-2L)L ^A ^-2A'L~^A. This is equivalent to the result given in Section 2.1, as per Cook (1986). 4.6.4 Profile Likelihood Displacement The profile likelihood displacement, m(a;,7(0J) = 2[L(^i,^2;y) - /^(^w,^(^iu.);2/)], also requires a second-order assessment. First, the influence measure does not depend on a; explictly, so s^j vanishes. Also, O-y am(a;,7(0.)) di{e^) -2 aL(7(^J);y) di{e^) -2 dL{e-y) 30 = 0. This implies that the gradient vector is zero. 113 The next step is to construct the building blocks of a second-order assessment. In order to obtain a formula for K, we present the following result, which appeared in Cook (1986) as equation (24): . dg{9,) de\ — Llryn Ll 22 ■'^21 where X = L21 L22 Using this result, we find K dim dO,. d9i I — 1/22 -^21 (4.47) (4.48) 114 It follows that the acceleration matrix for the profile likelihood displacement IS M^'^' = I J'K' I J'K' I J'K' I J'K' = -2J'K'LKJ KJ + [s; ® J'] GJ + [s'^K Ig] P " d')(e)di{9)' 10 '-' flOflOl / KJ d9d9' 10 / KJ -2L I KJ -2A'L = -2A'L ■t'll ■£'12 L21 L22 /., 22 ^21 -1/99 Loi Z-^A -t'li — L12L22 L21 L A -2A'{L ' - L22' )A. This is equivalent to the formula given in Section 2.1, as per Cook (1986). 4.6.5 Kullback-Leibler Divergence The Kullback-Leibler divergence, m — Ea log(i(?^) /(W|9J 115 also requires a second-order assessment. The measure does not depend on u; explictly, so s^ vanishes. Also, 7(^0;) = 0^, implying dm{u;,-f{e^)) di{e^) = — ttEo del dE, f{y\e.)' oe' --Ee dL{e^-y) de, -Ee [0] 0. Proceeding with a second-order assessment, we first note that K = Ip. We also make the following observation: •va(-. — ; TrEa f{y\e^y — : ttEs L{e^\y) ■ . . -. ■ =-^ Evaluating this at the null perturbation implies ' d^L[0-y) ddde' (4.49) '-'77 ~ '^e Jo and the acceleration matrix follows: I J'K' I -A'L = -AT' E / KJ s 77 -L A (4.50) d'L{0;y) 8080' l-'a. 116 An interesting consequence of this result is the following: if the observed and expected information matrices are equal, then the acceleration matrix is proportional to the acceleration matrix of the likelihood displacement. 4.7 First-order Assessment of the Log Generalized Variance In this section, some general results for first-order local assessment of the log of the generalized variance are presented. In addition, the directional derivative for the variance perturbation of a regression model under ML is derived. 4.7.1 General Form of the Directional Derivative Theorem 4.5 For the measure m = log the A;*'' element of the vector s^ is given by 'd'K{0;y) —trace h duJkdO' and the f^ element of the vector 8^ is given by k = l,...q, — trace Additionally, we may write dejdo' h j = l,...p. s^ = R{-h ) = ill -^2 • • • Rp vec (-^ ), where each Rj is a q x p matrix of the form Rj — 3 = l,-P- (4.51) (4.52) (4.53) (4.54) 117 Similarly, 8-y = h{—h ) hi /i2 hp vec{—h ), (4.55) where each hi is a p x p matrix of the form d9d0' Note that the matrices Ri and hi appeared earlier in the computational formula for P (Theorem 4.4). Proof of Theorem 4.5 To begin, we present the following two matrix differentiation results for a matrix A and a scalar t (Searle et al. 1992): dAT^ _ _^-.dA dt ~ ^ dt^ d_ dt log |A| = trace .idA 'dt (4.56) (4.57) Combining these two identities yields d dt log \ — A ^ I = trace — trace = —trace (-A-')-'l(-A-') dA df (4.58) Using (4.58) we find . d dui m{u}, 0u) = —trace d^K{e-y) dcjkde' dK{0^;y) ^^ 6=9^ dO' y and evaluating at a; = wq and 6 = yields (4.52). Expression (4.53) follows from a similar argument. 118 The next step is construction of (4.54). First, the trace of product of two matrices Arxc and Bcxr is (Searle 1982) Letting A*-' denote the ij*'' element oi h , we may write (4.52) as p p dhU0;y) p p h'K Construction of s^ follows: «a, = - P P 1=1 j=i 3hi,^{e;y) aK,i(6\y) dhi,^{e;y) dhx,^{e;y) dOpUJi a/i2,u,(e;v) deiuji dhiAOw) aepui dhpMOw) d6p(jj-i = dhi,^(e;y) dBiuj2 dhi,^{e;y) d9pUi2 '" <-^' ■ dOlUlq dhi,^{e;y) dOpUlg dh2,u(e;y) deiUq dh2,u,(6;y) dBpUq dhp,^(0;y) dOpUlq vec{~-h ) ill R2 • • • Rp vec{—h ). Construction of s^ proceeds similarly. D Corollary 4.3 For the measure (i.bl), the slope in direction v is -1, s = vec{-h y R' + hJ' V, and the direction of maximum absolute slope is (4.59) Umax <X m = R + Jh vec{h ) . (4.60) 119 4.7.2 Variance Perturbation for ML Regression In this section, first-order local assessment diagnostics are derived for the variance perturbation in ML regression with the perturbed generalized variance as the objective measure. Because h has a simple form, we derive s^^ and s^ element-by-element rather than construct the matrices in (4.54) and (4.55). 4.7.2.1 Construction of 8t^ To derive the i*^ element of s^^ we begin with the following results from Section 2.5: d^K{e-y) _ 1 dl3dui d'LUO;y) 1 ;{yi - x'i0)xi '/3\2 {yi-x[0) (4.61) (4.61a) where Xj is the covariate vector for the i*'' observation. Taking partial derivatives of (4.61) leads to d'LUe;y) d^d/3'duji xnp-i)Xi r.XiX^ Taking partial derivatives of (4.61a) yields d'L^{e;y) 1 ' a\2 {Vi - <I3) After evaluating these expressions at the null perturbation, we can construct the matrix dK{9-y) duide' d'L^{0;y) dedd'dui - ~2XiXi r; / v4 "^i -s (4.62) 120 Accordingly, the trace of the following matrix multiplication is the i^^ element of s^^: dK{0;y) dujide' {-h ) = --^y »*^i a^x'xy^ 0' 2(TVn XiX^{X X) "^"^i ^x[{X'X) lV\-l 2r^ (4.63) In order to find a formula for the trace, an expression for the fc"' diagonal element of the upper left portion of the matrix is needed. Letting Ujk be the jk*^ element of {X'X)-\ the A;*'' diagonal element of Xix'iiX'Xy^ is p-i '•}■■ • . j=l ■ * . Hence, the trace of (4.63) is given by p-i P-i 2r^ ""^ ^~^ 2r? fc=i j=i j=i fc=i From this, the vector s^j can finally be constructed: ■vp — 1 2rH (Xy7=l ^nj Zjfc=l(%fe^nfe)j + „o.2 X21X2 an 1-, — "21 _ a2(p-i) _ %;K \ Xi(p-i)Xi Xn{p-l)X^ a(p-i)i a(p-i)(p-i) na 2 «9 (4.64) Letting a, be the i^^ column of (X'X) ^, it follows that 121 r -| r -1 XuX[ ^i2a:i 3:21X2 ai - 3^21 aj'a a2.. . — _ a;„ix; _ _ a;„2< _ X2{p-l)X2 ^n(p-l)25„ flp-1 ncr 2' «9 XiiX'i 3:21X2 a:i(p-i)Xi 3:2{p-l)X2 3:nlX„ . . . Xn(p-i)Xj^ tti ttp-i no 2' «9 £)(Xi)X ... D(Xp_i)X vec(X'X)-^ - — -r ncr 2' «9 D(Xi) ... D(Xp_i) [Vi®X]vec(X'X)-^ no" 2' «9- 4.7.2.2 Construction of Sy Starting from the Hessian of the unperturbed log-likelihood, dh{e;y) _dL{e;y) d0' dOdO' -ix'x -^(y-X/3) ■My - x/sy ^-Mv- ^^Yiy - x/3) (4.65) the following partial derivatives at the null perturbation can be computed: d'Lie^y) d'L{0;y) = da^d^dlS' d'L{0;y) Xx'x d'HB; y) = -.X'iy - X/3) = ttXV = d{a^y -^e + ~s(y-xf^ny-x^) n Sr'r 2n a^ (T* a^ ' 122 Now the elements of Sj can be constructed: dm{u), e^) d/3i = trace ( d'L{0-y) = trace ( = trace ( = 0. d/3id0d0' i-h- )) 1 v' X'Xi 1 vi '-.x\x j,X',X{X'X)-' <t2(X'X)-i 0' 2£l n "^X'Xi •n ' ) Also, dm{(jj, 0^) da^ — trace ( = trace ( = trace ( p + 3 d'L{e;y) da^dode' i(X'X) i-h")) 0' ^I 0' 4 2n a2(X'X)-i 0' ^ n (4.66) (4.67) Hence, all but the last element of s^ is zero: tJ/y 123 4.7.2.3 Directional derivative Using the results above and A from Section 2.5, we may finally construct the directional derivative of the log of the perturbed generalized variance S={8l + s'KJ)v = {8l + s'K{-L-)A)v -K + = « + 0' 2+3 Q, {p+Z)a^ a2(X'X)-i 0' 251 n A)v j,X'D{r) 2&* sq )V (-[D(Xi) ... D(Xp_i)][Vi®^]vec(A-'X)-i \T..^ ^n' sq ' ~ 9 p + 3 , rsq)v = {-[D{X,) ... D(X,_0][Vi®^]vec(X'X)-^ + ^r,,)V Hence, the direction of the maximum absolute slope is (4.68) Umax oc -[ D(Xi) . . . r>(Xp_i) ] [Ip-i ® X] vec(X'X)-^ + ^^r,,. (4.69) 4.7.2.4 Simple random sample As a special case, consider a normally distributed simple random sample (SRS). The direction of maximum slope simplifies to 1 3 Vmax OC m = -£>(!„) [1 ® 1„] vec(-) + ;jT^r«, n na' oc fsq „ In (4.70) Hence, observations with squared residuals that deviate largely from ^ have large elements in Umax- This may seem odd, but recall that this represents influence on an estimated joint asymptotic confidence region for /i^^ and a^. If the log of the 124 Figure 4.1: Hills data; Vmax for perturbed log generalized variance of variance of these two estimates is considered separately, the directions of maximum absolute slope are respectively given by »^niax CX T sq a^r ^max 0^ ' sq The direction of maximum slope for (1^ is dominated by squared residuals that deviate greatly from the average squared residual, a^, while Umax for a^ is dominated by large squared residuals. The direction of maximum slope for the joint influence measure is simply a weighted average of these two vectors. 4.7.2.5 Examples As examples, we consider the Scottish hills data and the school data. For the hills data, the element of Umax for observation 1 1 is large and the element for observation 18 is huge (Figure 4.1). For the school data, elements 3 and 18 show substantial influence on the perturbed generalized variance (Figure 4.2). 125 Figure 4.2: School data; fmax for perturbed log generalized variance 9 4.7.2.6 A simpler measure The directional derivative simplifies greatly if the measure is based on the unperturbed generalized variance evaluated at the perturbed estimates: m = log ' ae' ' In this situation, s^ vanishes, but the other derivations follow from above, yielding the following results for the variance perturbation: c P + 3 , na 2 »9 '-'max p + 3 ''^max CX T sq- 4.8 Local Assessment for Quasi-likelihood Inference In this section, local influence assessment of quasi-likelihood estimates (Wedderburn 1974) for independent observations is outlined. A thorough treatment of quasi-likelihood inference can be found in McCullagh and Nelder (1989). Here, we present only some fundamental concepts, viewed from the vantage point of optimal estimating equations (Godambe and Heyde 1987; Desmond 1997a). Some local influence techniques are then outlined. 126 4.8.1 Quasi-likelihood ... Consider a model in which a set of n independent observations y depends on a set of p parameters through their means ti{0). Although full distributional assumptions are not made, we also assume that the variances of y depend on their means: V[j/] = a'^V{n). A class of estimating functions that we may consider for estimating are given by h*i0;y) = W{0){y-fi{0))y, (4.71) where the p x n weight matrix W{0) can be a function of the unknown parameters. Standardizing (4.71) as in (4.3) yields (Wedderburn 1974; Godambe and Heyde 1987): h{e; y) = %V{t^)-\y - t^m/<^'- (4.72) These optimal estimating functions are called the quasi-score functions because the solutions to h{0;y) = maximize the following "quasi" -likelihood function, provided the integral exists: • 4.8.2 Local Assessment Some preliminary results for local assessment of quasi-likelihood models for independent data are presented in this section. Assumptions about the model's form are presented, some perturbation schemes are proposed, and a case-weight perturbation is applied to an example data set. 4.8.2.1 Assumptions It is assumed that observations are independent, and that the variance is related to the mean by the function V(/i). In addition, it is assumed that the mean 127 is related to a linear predictor 77 = x'fi through a link function, g: gilJ') = V- 4.8.2.2 Perturbation Schemes One perturbation is to introduce case- weights, or equivalently, variance perturbations. A set of n perturbations can be introduced into the estimating functions in a multiplicative manner: hU0;y) = ^D{u:)V{fj,)-'{y-t^{e))/a\ Writing the j'*'' perturbed estimating function in summation notation shows how the perturbations weight the contributions of the individual observations: The matrix A is then given by dhUe;y) A = d(jj' 1 ^^^'.DiV^) a^dO 'V{^i) where D{^^^) is a diagonal matrix with i*^ diagonal element {xji - Hi)/V{fii). If the response is continuous, then a perturbation can be made to assess additive errors in measurement. The perturbed estimating functions are then KiO;y) 'de V{n)-^y + u;-n{0))y, which leads to 1 dK{e-y) _ dfi' ~ dO du}' v{n-)- Other perturbations can also be introduced, including errors in the explanatory variables and perturbations to the linear predictor. . , 128 4.8.2.3 Example: Case-weights Applied to Leaf Blotch Data As an example, we apply the case-weight perturbation to a data set concerning leaf blotch (Wedderburn 1974; McCullagh and Nelder 1989). The percentage of leaf area affected by the blotch was recorded for 10 varieties of barley grown at 9 sites. Although the response is continuous, it can be analyzed as if arising from binomial sampling using the logit link function and a main effects model: V{fii) = Hi{l - fjLi) gifii) = log — -^— = T]i. Plotting the Pearson residuals from this model against the linear predictor indicates that the model overestimates the variance for the largest and smallest proportions (Figure 4.3). Applying the case weight perturbation requires h and the following matrix: - ,. . , ' ,• A = ^X'D{y-ii), where X is the model matrix and D{y — fi) is a diagonal matrix of the raw residuals. To measure the effects of perturbation, we use the unperturbed Pearson Chi-squared statistic evaluated at the perturbed MLEs: m = E This implies that the final building block for a first-order assesment of this measure is given by dm ^'' 80 = X'z,, where the n x 1 vector Zi has z*'' element 2y,/if + yf(l-2/i,)-/^^ Ai(i-Ai) 129 The direction of maximum local influence is then Vraay.CC D{y- ij,)Xh X' Zi- Plotting the elements of Vmax against case indices indicates that elements 24 (site 3 variety 4) and 65 (site 7 variety 5) are influential (Figure 4.4). When plotting elements against the corresponding linear predictor, it is clear that observations with moderately-sized proportions are more influential (Figure 4.5). As in Wedderburn (1974) and McCullagh and Nelder (1989), we consider fitting a model with an alternate variance function, V{ijl) = /i^(l — /x)^. This function reflects the assumption that extreme proportions are more precise than those arising from a binomial distribution. Pearson residuals indicate that the alternate variance function is more realistic, but that a relationship between the residual variance and the the linear predictor remains (Figure 4.6). Implementing the case- weight perturbation and assessing its effects on the Pearson Chi-square statistic requires the following quantities: ■: ^ -, ^' '-'■ '■' - "'' .■- ■; 0-2 /i(l - /x) and dm X'Z2, where the n x 1 vector Z2 has i*'' element 2{fif - Sfj-fVi + ili{2yf + yj) - yf) The direction of maximum local influence is then t^max OC D{^^—^)Xh ^X'Z2. /i(l-/x) 130 ■H 1 n 01 u Vt^ TT' • •« % • -8-6-4-2 2- h Figure 4.3: Leaf blotch data; Model I; Pearson residuals vs. linear predictor A plot of Umax indicates that only observation 24 is influential for this model (4.7). The elements have less association with the linear predictor than the binomial model (Figure 4.8), indicating that the contribution of the observations to inference is more appropriate in the second model. rv< 131 1 0.75 0.5 0.25 -0.25 -0.5 -0.75 **<* « * V. * • .i f*^ V*^^^^^ ' "V * «. **' ; **' » J 20 40 60 Case 80 Figure 4.4: Leaf blotch data; Model I; Vmax- The first ten indices correspond to site 1, varieties 1-10, while the second ten indices correpsond to site 2, varieties 1-10 etc. The dotted vertical lines are present to help differentiate sites and varieties within sites. 0.6 0.4 S 0.2 I -0.2 —I— I— I— I— I— «> i. 'J . '^. .v. a* > *" *«■ »': ' '., v.. A'» «v^ »• -6 -4-2 2 h Figure 4.5: Leaf blotch data; Model I; Umax vs. linear predictor 132 .A^'^^' 2 ■ • 2 ' .*■ • .■S 1 01 « u • • • • • * • • • • • • • ■ • .rf'V •.; 1 • 1 -1 • I • • H ;> • * . ■ 1 • -4 -2 h Figure 4.6: Leaf blotch data; Model II; Pearson residuals vs. linear predictor 1 0.75 0.5 0.25 -0.25 -0.5 -0.75 . /-^ •". • • • • • • • • • • ^' .%••• . . . • • • < H.O • • • • • 20 40 Case 60 80 Figure 4.7: Leaf blotch data; Model II; Umax- The first ten indices correspond to site 1, varieties 1-10, while the second ten indices correspond to site 2, varieties 1-10 etc. The dotted vertical lines are present to help differentiate sites and varieties within sites. 0.6 0.4 S 0.2 -0.2 133 •. • • •,'• -8 -6 -4 -2 h Figure 4.8: Leaf blotch data; Model II; Umax vs. linear predictor CHAPTER 5 LOCAL ASSESSMENT OF PREDICTIONS In this chapter, local assessment in prediction problems is considered. First, prediction is motivated by two methods: first principles and predictive likelihood. This is followed by a brief discussion of influence measures for estimation and prediction. Then, predictive inference and existing local influence techniques for linear mixed models are reviewed. These techniques are then extended to assess the effects of perturbations on predictions, and three perturbations schemes are applied to an example data set. 5.1 Prediction via First Principles Consider the problem of predicting a vector of random variables, u. Among predictors u*, u — E(m) minimizes the mean squared-error of prediction, MSEP(m*) = E(m — w*)^. The standard error associated with this prediction is y/MSEP(w) = y/Y{u). However, prediction is typically not so simple; it is usually performed using information from a sample and in the presence of unknown parameters. 5.1.1 Prediction After Data are Observed ? Suppose now that prediction of u takes place after observing data y. If the two random vectors are correlated, then an improved inference can be made about u. Here, the conditional mean, ti = E(w|y), can serve as a point prediction. This prediction minimizes the conditional mean square error of prediction, CMSEP(m*) = E(u - u*\yy. Since it minimizes the CMSEP for given y, it also . minimizes the (marginal) MSEP. This best predictor, ii, differs from u in that it is a random variable, rather than a constant. Also, the best predictor's MSEP can be no 134 135 larger than that of it: MSEP(m) = V(w) = E(V(M|y))+V(E(«|?/)) >E{V{u\y)) (5.1) = E{E{{u-u)'\y)) = E(CMSEP(u)) = MSEP(w). Other predictors can be derived if the class of predictors is restricted. For instance, minimizing MSEP among predictors that are linear functions of y results in the best linear predictor (Searle et al. 1992): BLP{u)^E{u) + CV-\y-E{y)), (5.2) where V is the variance of y and C is the covariance of u and y. Although not required by its definition, the BLP is an unbiased predictor of u since E{BLP{u)) = E{u). 5.1.2 Prediction After Data Are Observed, but with Unknown Parameters Typically the goal of prediction is accompanied by a need to estimate unknown parameters 0. Here, we may first estimate the parameters using the observed data and then estimate the best predictor. The MSEP of the estimated best predictor can then serve as a measure of uncertainty in the prediction, although Booth and Hobert (1998) argue that CMSEP is more relevant. To illustrate, consider the normal linear regression model with unknown parameters, /3, but with known variance, a^. After observing data y, the goal is to predict future independent observations u ~ N{Z^,a^Im). From either ordinary least squares or ML, = (X'X)-'^X'y. Hence, the estimate of 136 E{u\y) = E(m) is Z^, and its mean squared error of prediction can be shown to be a^{Z'{X'X)~^Z + Im)- Also, Z^ is normally distributed. These facts can be used to create a 100(1 — a)% prediction interval for a single it with covariate vector z: z'3 ± Za/2 ^J<r^z'iX'X)-'z + l). (5.3) In the event that a^ is unknown, the prediction interval given in (5.3) is still valid but cannot be calculated. One option is to simply replace a"^ by s^ to produce an approximate prediction interval. A better alternative is to make use of the following pivotal quantity: u - z'^ tn-fc, (5.4) y/s^{z'{X'X)-'z + l) where k is the number of regressors. An exact 100(1 — a)% prediction interval is then z'0 ± t„^k,a/2\JsHz'{X'X)-^z + 1). (5.5) This simple situation highlights the fact that unknown parameters can complicate the prediction problem. While pivotal quantities may be available in simple situations, accomodating unknown parameters may be more cumbersome in complicated models. Therefore, attempts have been made to unify predictive inference via predictive likelihood. 5.2 Predictive Likelihood Predictive likelihood is a likelihood-based framework for handling the problem of prediction. The goal is to predict w via a point or interval predictor in the presence of unknown parameters 0. If u is treated as a vector of parameters, then the joint likelihood of u and is L*{u,9-y) = f{y,u\e) = h{u\e,y)h{y\e). (5.6) 137 A predictive likelihood for u would result by somehow eliminating from the above expression. For Bayesians, this can be achieved in a straightforward manner. The posterior density of given the data y is obtained, and then (5.6) is integrated over the density of 0. The resulting expression is a posterior predictive likelihood for u. For frequentists there are several competing options, as outlined in Butler (1986) and Bj0rnstad (1990). Here we consider two: the estimative predictive likelihood and the profile predictive likelihood. 5.2.1 Estimative and Profile Predictive Likelihood The estimative/empirical predictive likelihood replaces in (5.6) by 0, its MLE. The resulting predictive likelihood is actually proportional to the density of u given y with the MLE inserted in place of 0. This is considered a naive approach, as resulting inferences for u do not take into account the uncertainty in the estimated parameters. An alternative frequentist predictive likelihood can be motivated by profile likelihood. The idea works as follows. If is considered a set of nuisance parameters, then a technique for estimation is to first maximize the joint likelihood of {u,0) with respect to for fixed u. The resulting solution, 0{y,u), is then substituted into the likelihood. The resulting expression is a function of only u and y, and is known as the profile predictive likelihood of u (Mathiasen 1979; Lejeune and Faulkenberry 1982; Levy and Perng 1984). 5.2.2 Point and Interval Predictions Regardless of which predictive likelihood is chosen, point and interval predictions are of interest. One option for a point prediction is to maximize the predictive likelihood, £{u; y) with respect to u. However, the resulting maximum likelihood predictor can have poor properties (see Bj0rnstad, 1990, examples 4 and 7) . An alternative can be motivated by standardizing the predictive likelihood to be a predictive density for u: f{u\y) = £{u; y) / f2{y) . The mean of this conditional density, the predictive mean, is the preferred point predictor. Standard errors and prediction intervals can also be constructed from this predictive density. 5.2.2.1 Example Consider again the situation of predicting future observations in a linear regression model with 0' = {0',(t^) unknown. Naively replacing by in the conditional distribution of u given y yields the estimative predictive likelihood. Hence, inference proceeds under the assumption that u is conditionally distributed as normal with mean Z^ and variance a'^Im- A prediction interval for a single u with covariate vector z would then be In contrast, the conditional distribution of u from standardizing the profile predictive likelihood is multivariate t with mean Z0 and dispersion parameter na^{I + Z'{X'X)~'^Z), as shown by Levy and Perng (1984). A prediction interval for a single u would then be z'^±tr,-pa^Jl + z'{X'X)-'z. This matches the exact prediction interval given in (5.5). 5.3 Influence Measures for Prediction In this section, some general measures are proposed for assessing the effects of perturbation on point estimation and point prediction. Notation is as follows: $ are point estimates for parameters and u are point predictions for random variables u. Also, let ^^ and u^ denote the perturbed versions of these point inferences, and let PL{u;y,0) denote the log of the unstandardized predictive likelihood. 139 5.3.1 Predictive Likelihood Displacement An overall measure of the change in estimates and predictions u due to perturbation is the predictive likelihood displacement: PLDiLj) = 2[PL{u; y, ^) - PL{u^- y, 0J]. (5.7) A local assessment of this influence measure could be first- or second-order, depending on the model and the method of inference. Thomas and Cook (1990) considered influence on predictions in generalized linear models (GLMs). To explain, let u denote a single future observation that is independent of the observed data. Also, let f{-\0) denote the density function shared by the observed data and the future observation. Thomas and Cook (1990) proposed the following influence measure for assessing the eflfects of perturbations on predicting u: > , t / d*{u:) = \og f{u\0) -log f{u\0J. (5.8) In the context of the article, parameters were estimated by ML and point predictions were u = E{u\y,i3). For ML estimation of a GLM, d*{u}) is non-negative and minimized at the null perturbation, indicating that a second-order local assessment is appropriate. However, in other situations it can be negative. The influence measure d*{u}) is motivated by a deviance residual for the i*'' observation (McCuUagh and Nelder 1989): ^ogfivM = Vi) - log f{yi\ni = fii0)), (5.9) where E{yi) — /Xj and fli is the estimated mean of the observation based on the model. In the context of GLMs, each deviance residual is non-negative, with smaller values indicating a better fit. The deviance of a model is defined as the sum of the n deviance residuals and serves as measure of overall goodness-of-fit. 140 The influence measure d*{u}) can be related to predictive inference in the following way. First, define the following modified predictive likelihood displacement: PLD*{u;) = 2[PL{u;y,^) - PL{u;y,0j]. (5.10) This differs from PLD{u}) in that the prediction for u is not perturbed. Also, assuming y and u share the same density and that all elements of y and u are independent, we have m m PLD*{u;) = 2[log/(y|^) + ^log/(n,|^) - log/(yi;9J - ^log/(n,|;9J] t=l i=l m - 2[log/(y|;a) - log/(y|^J] + 2^(log/(«,|^) - log/(«,|^J) (5.11) i=l m ' •, . 1=1 where d*{u}) is the deviance residual of perturbation for the i*'' prediction. Note that the prediction for u is the same in both terms of rf*(w). Technically then, d*(uj) is a measure of the pertubation's effects on the estimated conditional mean of u rather than a change in the prediction for u. We note that under the same assumptions given above, the predictive likelihood displacement (5.7) can be written as m PLD{u}) = LD[(jj) + 2 ^ di[u}), i=l where di{l^) =\og f{Ui\'p) -log f{UiS^ J. 5.4 Estimation and Prediction in Linear Mixed Models In this section, the linear mixed model is presented with point estimates and point predictions motivated in three ways. Then, local influence techniques for the 141 linear mixed model are reviewed, and additional tools for assessing predictions are presented. 5.4.1 Linear Mixed Models The linear mixed model (LMM) is formulated as follows: y = X)9 + zu + €, (5.12) where u 5 G e \ R y is an n X 1 vector of observations, X is a known n x pi matrix, Z is a known n X p2 matrix, ;9 is a pi x 1 vector of unknown fixed effects, u is a p2 x 1 vector of unknown random effects and e is a n x 1 vector of unknown residual errors. The matrices G and R are of order p2 and n respectively. Littell et al. (1996) provide a host of applications for this rich class of models. When there are no random effects and R = a"^!, this model simplifies to the fixed effects regression model (1.10). The inclusion of random effects and the general residual variance matrix R permits the modeling of correlated data. This is evident by looking at the marginal distribution of the data under model (5.12): % . «> *r>- ' ■I 'r** ■J- *- Y ^N{X^,ZGZ' + R). (5.13) The unknowns in (5.12) are the fixed effects 0, the random effects u, and the elements of the covariance matrices G and R. In this chapter, we will concentrate on assessing the effects of perturbations on inferences for and u. Point estimates of and point predictions of u can be motivated by three methods: first principles, mixed model equations, and profile predictive likelihood. See Searle et al. (1992) for information on estimating G and R when these matrices are unknown. 142 5.4.2 Inference by First Principles The best linear unbiased estimates (BLUEs) of are the generalized least squares estimates: V ,: p = {X'V-'X)-'X'V-'y, (5.14) where V = ZGZ' + R. Here, "best" refers to minimum mean-squared error of estimation. The estimates are equivalent to the MLEs derived from the marginal distribution of V . The best linear unbiased predictors (BLUPs) for u are given by u = GZ'V-^c, (5.15) where c = y — X/3 is the vector of generalized least squares or composite residuals, the latter term being coined by Hilden-Minton (1995). Here, "best" refers to minimum MSEP. Note that the BLUEs and BLUPs do not require normality- only the model and covariance structure specified in (5.12). Detailed derivations can be found in Searle (1971) and Searle et al. (1992). 5.4.3 Inference by the Mixed Model Equations Henderson et al. (1959) derived estimates for the LMM by maximizing the joint log-likelihood of u and /3: (5.16) (x--[{y-Xp- Zu)' R-^ iy-X0~ Zu) + u'G-^u] . This likelihood is also known as an h-likelihood (Lee and Nelder 1996). The solutions that maximimize (5.16) are in fact the same as the BLUEs and BLUPs. The resulting estimating equations are called the mixed model equations (MMEs) and 143 are usually written V X'R^X X'R^Z Z'R^X Z'R-^Z + G-^ The corresponding set of estimating functions are X'R-'y Z'R-'y (5.17) h{u,l3;y) = X'R-\y-X0-Zu) Z'R-\y - X/3 - Zu) - G-^u (5.18) 5.4.4 Inference by Profile Predictive Likelihood The BLUEs and BLUPs can also be derived by the profile predictive likelihood. Recall that inference using the profile predictive likelihood works as follows: 1. Maximize the joint likelihood of u and ^ with respect to 0. Denote - the solution as /3(j/, u). 2. Substitute ;9(2/,w) into the joint likelihood. The resulting expression, L*{u,^{y,u);y), is the profile predictive likelihood of «. 3. Obtain a prediction for u by either of the following methods: a. Obtain the maximum likelihood predictor (MLP) by max- imimizing L*{u, ^{y, w); y) with respect to u. h. Obtain the conditional/predictive mean of u given (y', ^ (y, u)) horn fi{u\y,0{y,u))ccL*{u,0{y,u);y). 4. Obtain point estimates of /3 by substituting the predictor of u from step three into ${y, u). By definition, the point inferences from steps 3a and 4 maximize the joint likelihood of u and (3. This implies that using the MLP from the profile predictive likelihood leads to the BLUEs and BLUPs. However, it is not as clear that the solutions from step 3b and 4 lead to the same solutions, and so the derivation 144 follows. The first step is to maximize the joint likelihood of u and with respect to /3 for given u. This leads to the first set of mixed model equations (5.17), which have solution ^{y,u) = {X' R~^ X)~^ X' R'^ {y - Zu). The next step is to show that predicting u with its conditional mean given the data and the estimate /3(j/, u) leads to the second set of mixed model equations. Starting with the profile predictive likelihood, we have L*{u,0{y,u);y) ocexp(-^{y- X^{y, u) - Zu^ ' R' (y - X^{y, u) - Zu) - lu'G-'u) ^exp(-^y'R-'(^y-X0{y,u)-Zu) + ^^\y,u)X'R-'(y-X0{y,u)-Zuj + ^u'Z'R-' {y - X^{y, u) - Zu) - ^WG-'u] = exp (^ - ^y'R-' (y - X0{y, u) - Zu) + ^u'Z'R-' (j/ - X^iy, u) - Zu) - lu'G-'u) , (5.19) 145 ''o-l/ since X'R (y — X^{y, u) — Zu) = 0. Examining the exponent, we find -\(y'R-\y - XP{y, u) - Zu) - u'Z'R-\y - X^{y, u) - Zu) + u'G-^uj -Uy'R-'y - y'R~'X^{y, u) - 2u'Z'R-'y + u'Z'R-'XP{y, u) + u'Z'R-^Zu + u'G'^u ' .' D— 1 'v/' v' D— 1 ■v^-l "V' E>— 1( oc -- ( - y'R-'X{X'R-^X)-^X'R-\y - Zu) -2u'Z'R-^y + u'Z'R-^X{X'R-^X)-^X'R-\y-Zu) + u'Z'R-^Zu + u'G~^u (x-U- 2u'{Z'R-^ - Z'R-\X{X'R-^X)-^X'R-^)y + u'{-Z'R-^X{X'R-^X)-^X'R-^Z + Z'R^Z + G-^)u = -Uu'{Z'{I - R-^X{X'R-'X)-^X')R-^Z + G-^)u - 2u'{Z'{I - R-^X{X'R-^X)-^X'))R-^y = -i (u'{Z'P'R-^Z + G-^)u - 2u'Z'P'R-'y\ , (5.20) where P' = I - R-^X{X'R-^X)-^X'. It follows that f{u\y, ${y, u)) oc exp (^ - ^{u' {Z' P R-^ Z + G~'')u - 2u'Z'P'R-'y)\ (5.21) This is proportional to the kernel of a normal distribution with a mean of (ZP'R-'Z + G ') 'Z'P'R-'y, (5.22) and this predictive mean is the point prediction for it. 146 Meanwhile, substituting the solution from the first set of MMEs into the second set yields Z'R-^X{X'R-^X)-^X'R-^{y - Zu) + {Z'R-^Z + G-'^)u = Z'R-^y. (5.23) Solving for u provides the BLUP of the random effects, u = {Z'P'R-^Z + G-^)-^Z'P'R-^y, and this is identical to the predictive mean (5.22). Hence, predicting u with the predictive mean from the profile predictive likelihood yields point inferences for « and /3 that are identical to those from the mixed model equations. 5.5 Local Influence Methods for Linear Mixed Models 5.5.1 Assessment in the Marginal Model . ,_ Local influence techniques for linear mixed models have been developed using the marginal model (5.13). Beckman et al. (1987) considered a special case of the linear mixed model known as the variance components model. This model is used for data with a sources of grouping or clustering, where the i*'' source has gi levels. For Zi being of size n x g^ and Wj being of size 5', x 1, the model is given by y = X0 + ZiUi + Z2U2 + ...ZaUa + e. (5.24) This structure implies Z is a block-diagonal matrix with i"* block Zi, and u has a total of p2 = Yli=i 9i elements. In addition to the general LMM assumptions, the Wj vectors are mutually independent, with V(ui) = aflg. and V(c) = cr^J„. The model implies that the marginal variance of Y can be written as - ' - ■ "• V = ZGZ' + R=J^afZiZ[ + a^I. 1=1 147 Beckman et al. (1987) assumed the fixed effects as well as the variance parameters, {a^,al, . . . ,(Tq), are jointly estimated by ML. Two perturbation schemes were introduced. The first scheme perturbs the z*'* residual variance to Wjcr^, while the second scheme introduces a perturbation to the variance of each random effect: the variance of the j*'' element of Mj becomes uijaf. While both of these perturbations technically perturb the model assumptions, the major goal is to assess the influence of individual observations and clusters, respectively. Lesaffre and Verbeke (1998) emphasized applications to longitudinal data. They considered the following linear mixed model: Vi = Xi/3 + ZiUi + Ei (5.25) where i = l,...g, yi is vector of nj observations, Xi is an nj X p matrix , Zi is a,n rii X t matrix, Ui is a vector of t random effects and Cj is vector of rii errors. It is assumed that V{Ui) = Gj, V{€i) = /„,, and that y^ and y^, are independent for i ^ i' . The model permits the inclusion of random slopes, and the matrix Gj can be as general as needed. Also note that the model has a total of p2 = gt random effects. In the context considered by Lesaffre and Verbeke (1998), the subscript i corresponds to the i*^ subject, from whom repeated measurements are taken over time. The model implies that the log-likelihood is the sum of contributions from each of the subjects, and by allowing each of the m weights to vary from 1, a subject-level 148 case weight was introduced as a perturbation. Because the perturbation does not correspond to a new density for the data, it would be considered a perturbation of the estimation technique. Diagnostics were primarily based on basic perturbations (i.e., one-at-a-time perturbations) of subject weights. Lesaffre and Verbeke (1998) showed that the basic curvatures could be written as the sum of several terms, and the easily-interpretable terms were used as a means to compare the relative influence of subjects. 5.5.2 Assessment of Predictions The influence of perturbations on both the BLUEs and BLUPs can be assessed by applying the techniques developed in Chapter 4 to the predictive HkeUhood displacement (5.7). The estimating functions, h{0,y), are those used in the mixed model equations (5.18), and if the perturbed model is also a linear mixed model, the perturbed estimating functions, hi^{0;y), are also of this form. Local assessment of the predictive likelihood displacement requires the following quantities: e = 13 u i{e^) = e^ m = PLD{u,) = 2[PL{u; y, ^) - PL{u^; y, ^ J]. The next step is to determine whether first- or second-order local assessment is appropriate. Since the influence measure does not explicitly depend on a;, s^^ = 0. Also, by derivation of the mixed model equations. 8^ dPLD{u}) djiej d 30^ {-2PL{u^;y,0J) = -2 dL{e;y) de = 0. 149 The implication is that the gradient vector vanishes. Proceeding with a second-order assessment, we note S = dwdui' |0 dwd-yiey 1° d'y{0)dw' l0 d'y{0)d'y(ey I" 5 77 iif 57(^<.) a©,. Using (4.32), the acceleration matrix is then /, JiiT' ■*9 (5.26) = -2A'h" .52L(0;j,) 8080' h A -2A'h A. where h = 8h{u,/3;y)/d0'\o. The matrix h is derived as follows. Taking derivatives of the estimating functions with respect to we find: 8h{u,/3;y) 80' d_ 80' X'R-\y-X/3-Zu) Z'R-\y -X^- Zu) - G'u X'R^X X'R^Z Z'R-^X Z'R^Z + G-^ (5.27) The expression is not a function 0, so (5.27) is in fact h. It can be shown by- inverting this matrix that the variance matrix of the BLUEs and BLUPs is (Searle et al. 1992) where -h 150 (5.28) -h^^ = {x'v-^xy^ -h?^ = -GZ'V'Xh^^ -/i22 = {Z'R-^Z + G-^)-i - h'^^X'V-'^ZG. Once a perturbation is chosen and the perturbed estimating functions are constructed, computation of the acceleration matrix (5.26) is straightforward. 5.5.3 Assessment for Subsets of Point Inferences Influence measures for subsets of the perturbed point inferences can also be developed. If only the fixed effects are of interest, then assessment can be done using the marginal model (Beckman et al. 1987; Lesaffre and Verbeke 1998). More generally, a profile predictive likelihood displacement can be defined to assess the change in a subset of fixed and random effect point inferences, 6^ = {^i,u[): PLD^''\^) = 2[PL{u-y,P) - PL{u,^,gu{K)-yA.,9p{0i.))l (5.29) where ^j,^ = Cfii^,u\J comes from /9^ = {^[^,^'2^' and u^ = (wio/'^L)' and i9'j3{^ii^),9'u{0icj)) maximimizes PL{ui^,U2;y,^i^,02) with respect to {/3'2,u'2). For linear mixed models, the acceleration matrix follows from the arguments for deriving the acceleration matrix of LD^^^^{u}) in Section 4.6.4, and is given by M = -2A'{h - -1 h 22 )A, 151 where /122 comes from the partition h hii hi2 h2\ /1-22 When influence on a linear combination, x'^ + z'ii, is of interest, the results of Section 2.7.1 can be used. Letting x*' = {x',z'), it follows from the argument of Theorem 2.1 that . . «f Cm^ = ^||A'/l"'||2 t;:nax OC A'h'x*, (5.30) where ki = h x*. 5.6 One-way Random Effects Model The one-way random effects model can be used to provide insight into the diagnostic value of locally assessing the linear mixed model. Some preliminary results for this special case of the LMM are presented here, so as not to clutter later explanations. The (unbalanced) one-way random effects model is Vij - fi + Ui + eij i^l,...g j = 1,... m, (5.31) with V(itj) = a^ and V(eij) = a^. It is a special case of the variance components model (5.24) with a = 1, X = 1„, and /3 = jx, and is also a special case of the two-level model given in (5.25). If all n,- are equal to n/g, the design is said to be balanced. It can be shown that ^= y-^ 1 - a^d u,= -—^c, (5.32) where y, and Cj are the average response and average composite residual, respectively, for the z*'* cluster (Searle et al. 1992). The building blocks for deriving h are as follows: X = lr z = •■m R = a^In G = allg X'X = n Z'Z = Dim) X'Z = n' = {ni,n2,...ng). , ' ln„ 152 It follows that -h X'R^X X'R^Z Z'R-^X Z'R^Z + G^ iTl rr*- ^n ^Dini) + ^Ig n n n D{ a^+n,aj (5.33) The following results for a partitioned matrix with a scaler upper-left sub-matrix can be used to invert the matrix (Searle 1982): A B C D 1 -BD-^ + D-^ (5.34) 153 where c = A- BD ^C. Applying this result, we find -h a ~k 1 1 1 '^i+ c ' ni <^c^ + ^i + z atM{n„nj) + kD{^f^^) (5.35) where M{ni,nj) 1 1 and y 9 y 9 ■ ■^a^ + ^Vn, -^a^ + aVni 2 When the design is balanced, A; = ^-2/ +g2/n ) and the following simplifications can be made: -/* = (^c/^ + <^V^) 1 al/g + ayn -^1 5 » -'r'r/9 —V In^g 11 \ 2 1 /^ ''c/n T 9 "^ '^allg+cj^/n^g allg+ayn) ^9X 7" -^K ' g 9 "l '^^9^9 + n -'^S o-?/S+o'^/n .<- * -r: "^ Of particular interest are the first and second columns of — /i -1 —h di = -p a k (T^+CT^/m cr^+a^/rig -h d2 a'al k{a^,+ayn,) -1 + cf+a^Jni ni (t| +0-2/712 c^+a'^/ng 154 (5.36) (5.37) 5.7 Residual Variance Perturbation Scheme In this section, a residual variance perturbation is proposed. The perturbation is applicable to the general form of the linear mixed model. Simplifications when R = a^I are presented, and some closed-form expressions for the one-way model are given. 5.7.1 Form of the Perturbation Let aij denote the if^ element of the matrix R. A set of n perturbations is introduced into the model such that the perturbed residual covariance matrix is Rw — ai2 V'a)iW2 X/W1W3 y/(jJltjJ2 "it W2 013 (Tin 0-2n 03„ ■s/uliiiln 0^3n \/OJlCO„ y/cJ2Uln s/UJ3U]n The intent of this scheme is to weight the contribution of individual observations to inference in situations when they do not necessarilly enter the likelihood in an 155 additive fashion. The perturbation changes the variances but leaves the correlations among the error terms unchanged. The resulting local influence diagnostics can detect influential observations without upsetting the relationships assumed to exist among the residual errors. It simplifies to the error variance perturbation given for regression when R = a^I. 5.7.2 Computational Building Blocks - To derive A = dK{0]y)/duj'\o, define e = y-X^-Zu, and X* = [X Z]. We begin with the perturbed joint likelihood. Ignoring terms that do not depend on both and u? we have L^{u,0;y) = log(/(y | 0,u)g{u | 0)) ^-l[{y-x*erRzHy-x*e)] = -^ [y'R-Jy - 20'X*'R-'y + e'X*'Rz'X*0] . Taking derivatives with respect to 0, we find dL^{u,0;y) Kie;y)= ,, = ~ [-2X*'R-'y + 2X*'RZ'X*0] = X*'Rz'{y-X*0). Second partial derivatives are given by 156 Evaluating this at the original model we obtain the i*'* column of A: */ D-l Ai = X*'R 2<7li 2*^21 \aii ... )^(i+l)i 2^ni ... R-^e. (5.38) When H = a"^!, R^ = a'^D{l/u}), and the expression above simplifies, allowing us to write A = ^X*'D{e) and M -2A'L ^A = ^D(e) [X Z] T -1 r X'X X'Z Z'X Z'Z + a^G-^ X' z' D{e). 5.7.3 One-way Random Effects Model Deriving results for the one-way random effects model in more detail should provide information about the pertubation's diagnostic value. First, consider influence on the intercept parameter estimate, /i. This can be accomplished by using either m = fi^ or m — PLD^^\u}) as the influence function. The direction of maximum local influence for both measures is given by 157 Vmax cx: A'/l di ex D{e) [X Z] (T^+a^/m t2-I-/t2/ <T|+<T2/ng oc£>(e) ■•ni (x£>(e) (•'• (72+o-'2/ni)-'-"i oc ^Vm ei CT2+0-2/ni aVng Cfl _ a2+^2/"3 cr| +0-2/711 (5.39) where Cj denotes the residuals for cluster i. Thus, the direction of largest local influence is proportional to the residuals, but with each cluster scaled according to its size. The coefficients upweight observations in smaller clusters and downweight observations in larger clusters. The reason for this is that ft is not a simple average of the observations: the numerator of /t in (5.32) can be written as 9 ni EE- Vij rual + <t2 1=1 ]=i ^ This indicates that observations from smaller clusters receive more weight. Note that if either the design is balanced or al — 0, then Umax oc e. 158 We now consider influence on the predicted random effects. Without loss of generality, let us examine influence on Ui . The direction of largest local influence is given by Vr, DC A'h di oc D{e) [X Z\ + (T^+iT^/ni ni al+a"^ In2 al+a'^lng Die) >-ni + o-^+cr^/ni ni Die) -1 + + -1 + -1 + o-^+iT^/ni ni ^2 a'l+a'^ In2 of+c^Jng _k o-^/ni <T2+aVn2^2 )ei cj+a^/m t2-I-/i2 / ''l+aVns (5.40) 159 Although the individual cluster sizes play into the weights, it is clear from inspection that the residuals from the first cluster have weights roughly g — 1 times greater than the others. In the balanced case, all rii — n/g, and this implies y^ aVm _ {g- l)crV" ^ al + crVn,- a'^/n + al/g ' It follows that ^max Ot 1 ■9-1 62 s-i^s J Thus, in the balanced case, the residuals of the first cluster have weights that are g —\ times larger than the others. In fact, the BLUP of U\ is given by 'ylh o'^jn + allg 5-1 1 ^ )z/i - - Xl y^ 1=2 Not surprisingly, maximum influence on Ui is achieved primarily by perturbing the variance of observations in cluster i. 5.8 Cluster Variance Perturbation Schemes In this section, two cluster-based perturbation schemes are considered for assessing how clusters of observations influence estimation and prediction. 5.8.1 A Cluster-based Case- weight Perturbation For simplicity, we consider the model given in Lesaff're and Verbeke (1998), (5.25), but with the diagonality-constraint for Ri dropped. To perturb the model, we let Gi become ^Gj and Ri become ^ilj for i = \,. . .g. The perturbation of both matrices is important: it preserves the correlations among the random effects 160 as well the correlations among observations in the same cluster. Hence, the only effect of the correlation is to change the relative weighting of the clusters, allowing us to assess the influence of clusters of observations. 5.8.2 Computational Building Blocks To derive A we begin with the perturbed joint likelihood: K{u,/3;y) - log(/(2/ | /3,u)g{u | 13)) a - ^ [(y - X*9)' R-Jiy- X*9) + u'Cju] = -\ [y'K'y - 2e'x*'Rz'y + 9'x*'Rz'x*e + u'gz'u] Partial derivatives with respect to the fixed effects /3 are: ^^■^^'^^ = -\ [-2X'R'^'y + 2X'R-JXP + 2X'Rz'Zu\ = X'Rz\y-Xfi-Zu) = Y,u,X[RT\y,- Xip - ZiU,). 1=1 Partial derivatives with respect to w are du = Z'R-\y, - Xi/3 - ZiUi) - G-'u 9 = J^'^iiZ'.RT^y, - Xi^ - ZiUi) - Q.'ui). «=i The perturbed estimating functions are then K{e\y) = Y:U^iX\R.\y,- X,0 - Z,Ui) ELi uJi{Z\R.\y, - XS - ZiUi) - Cr'ui) (5.41) 161 It follows that the z*'' column of A is dK{e-y) dcoi X\RT\y^-X,l3-ZiU,) X\R^'ei (5.42) Z\Rr'ei - G-'ui This expression can be further simplified. Because the matrices Z, R, and G are all block diagonal, the BLUPs for the i*'' cluster can be written as Ui = {Z[Rr^Zi + Gr')-'Z[R7'c, (5.43) where Cj is the vector of composite residuals for the i*'* cluster. Substituting this into the bottom portion of (5.42), we find Z'iR;^ei~Gr^Ui = Z'iR;\ci-ZiUi)-G;^Ui =^ZrR7^Ci-iZ[Rr'Zi + Gr')u, = Z[RT'ci - {Z'.R-r'Zi + Gj'){Z\R-'Zi + G-'Y' Z[R-^ a = 0. This implies that only the top pi rows of A are non-zero. The matrix can be written compactly as A = X'R-^D{ei) 0„ 'flXff (5.44) where D(ei) is an n x m block-diagonal matrix whose i*'* block is ej. 162 The formula has several implications. First, the acceleration matrix for the influence measure PLD{ui) can be written as M = -2A'/i A - -2 D{ei)'R-^X h'' /."" h'' h" X'R-^D{ei) (5.45) ^-2D{ei)'R-^X{h}^)X'R-^D{ei) = 2D{ei)'R-^X{X'V-^X)-'X'R-^D{ei). This matrix has at most pi distinct non-zero eigenvalues regardless of the number of random effects. In addition, using the results given in Section 5.5.3, the acceleration matrix for influence measure PLD^^^{lo) is equivalent to (5.45). Finally, the acceleration matrix for PLD^^^u:) is M Diei)'R-^X ih-'- Kl ) X'R-^D{ei) •- -J (5.46) = 2D{ei)'R-^X{{X'V-^X)-^ - {X'R-^X)-^)X'R-^D{ei). /f ? 163 5.8.3 One-way Random Effects Model Using (5.30), Vmax for directed influence on a linear combination of the intercept and random effect point inferences is given by Umax oc A'^ X* a oc h X* ,11 12 h'' h" X gxg h'' h Eni Eni Using the relationship between Cj and Uj given in (5.32), X 12 X OC »» "i^i — '^i cTg + a^/ui „ (5.47) (5.48) This implies that the direction of maximum curvature is proportional to m, the vector of BLUPs. Thus, we see that the form of A has an interesting consequence when the intercept is the only fixed effect in the model: no matter what linear combination is chosen, the direction of maximum curvature is the same. This anomaly does not occur when ^ has more than element. 164 5.8.4 An Alternative Cluster Variance Perturbation Scheme In this section, the variance perturbation scheme proposed by Beckman et al. (1987) is addressed. For the one-way random effects model, the perturbation is V(ui) = Wjcr^. As each uji approaches infinity, V(«i) — > oo and V{yij) — > oo. Clusters perturbed to have higher random effects variance are deemed to provide less reliable information for inference. Also, the correlation between observations in the same cluster approaches 1 as Wi — > oo: Thus, observations from clusters that are perturbed to have higher random effects variance are deemed more alike, while observations from clusters perturbed to have smaller random effects variance are deemed less alike. This is in contrast to the perturbation given previously, which preserves the intra-cluster correlations. Also, the alternate perturbation does not reweight the clusters' entire contributions to the likelihood. Since it does not correspond to a cluster weight, it provides diagnostic information about the assumption of equal cluster variance rather than the influence of clusters. • By using Y{ui) = ^a^ rather than Y{ui) — ujiol, we can obtain Umax for this perturbation from results for the first cluster perturbation. Using the partial derivatives from Section 5.8.3, we may derive A = -G-^D{u) 165 The acceleration matrix is then M = -2A'h A = -2 r [/." h''] -£)(m)'G-^ A" A" -G-^£>(m) 22 = -2D{u)'G-'h G-^D{u) = 2D(u)'G-^ ({Z'R-^Z + Q-Y' (5.49) + GZ'V-^X{X'V-^X)-^XV-^ZG] G'^D{u This matrix has at most p2 distinct non-zero eigenvalues regardless of the number of fixed effects. In addition, using the results given in Section 5.5.3, the acceleration matrix for influence measure PLZ)["](a;) is equivalent to (5.49). Finally, the acceleration matrix for PLD^^^ (a;) is M i/-,-i -D{u)'G ■ -1 , • -1 hn [h - ) G-^D{u) (5.50) = 2D{u)'Z'V-^X{X'V-^X)-^XV-^ZD{u). As special case, consider the one way random eff'ects model. Using Theorem 2.1, the direction of maximum curvature for fi is Umax OC A'/l dy a -D{u) a-^+a'^/m a^+CT^/Ug _ (5.51) (Tl+aVm "i <T|+aVng% 166 Here, the elements corresponding to larger clusters receive more weight. This is in contrast to the first cluster perturbation scheme, for which Umax oc w. The direction of maximum curvature for Ui is given by -1 Umax CX: A'/l d2 oc -D{u) ^2 -1 + a^+a^/Ug (5.52) Vff2+'^2/„j + „j)M1 2 <72+a2/n2"2 where k was defined after (5.35). The coeflScient for the first element is approximately 1 + g/rii times larger than the others, indicating that influence on the i*'' BLUP is accomplished largely by perturbing the variance of Ui. The direction of maximum curvature for jj, + Ui is even more dominated by the first element: Umax cx: A'/l 1 1 a n. ' 2 a^+a'^/m'^^ 1 (7| +0-2/712 M2 U„ al+c^ln, "S (5.53) Here, the first element is approximately g — \ times larger than the other elements. 167 5.9 Application to Corn Crop Data In this section, the residual variance perturbation and cluster variance perturbation schemes are applied to an agricultural data set. After briefly reviewing the data set and goals of the analysis, local influence techniques are applied. Graphical results indicate that the data set has an influential observation. 5.9.1 Data Battese et al. (1988) analyzed reported areas of corn crops from 12 Iowa counties using a variance components model. Keeping the same notation as in Battese et al. (1988), the model is Vij where — PO + Ui + Pcorn^Uj + Psoy^2ij + ^ijj xjij = reported hectares of corn in segment j from county i xi^. = no. of pixels of corn in segment j from county i as per satellite imagery X2ij = no. of pixels of soy in segment j from county i as per satellite imagery txi~ i.i.d. N{0,al) Cij ~ i.i.d. N{0,a^) ■. 1 = 1,... 12 j = l,...ni {nj = (1,1, 1,2, 3, 3, 3, 3, 4, 5, 5, 6) ^ni=37. Of particular note in the data set is observation 33. The reported hectares for this case are exactly the same as those of the previous observation despite having vastly diflferent covariate values. Battese et al. (1988) felt that the value was misrecorded, and did not include it in their analysis. Appendix B contains model-fitting information from an analysis in which the the observation is included. Note that the regressors are also centered and scaled for this analysis. Additionally, the variance components have been estimated by restricted maximum likelihood (REML). By not perturbing the REML estimating functions, the local influence techniques described in this chapter can be used to assess the effects of perturbations on ^ and u while effectively treating the variance components as known. The goal of the analysis is to first establish a linear relationship between the satellite data, which is readily available, and the actual reported data, which is accurate but not available for all segments in all counties. The resulting model can then be used to make predictions for segments in which reported hectares are unavailable. Interest centers on predicting the population mean hectares of corn per segment. Because the satellite data are available for all segments in each of the 12 counties, the population mean number of pixels for corn and soy in county i is known. These means are denoted xu^^^ and ^2i(p) and respectively. The corresponding population mean hectares of corn per segment in county i is then defined as Point inferences for these values can be made using the the corresponding BLUPs: /^i = Po + ^i + Pcorn^lii^p) + Psoy^2i(p) • . 5.9.2 Residual Analysis Figure 5.1 contains diagnostic plots of the corn data. The first plot is of the 37 composite residuals c = y - X^. The three single-observation counties appear first, followed by the county with two obervations, etc. Cases from the same county (cluster) are joined by a line, a practice suggested by Hilden-Minton (1995). This 169 plot indicates that observation 33 is the most outlying from the overall estimated regression curve, X^. Figure 5.1(b) shows the residuals e = y - X$ - Zu. This shows that even when taking into account the county effects, observation 33 is the most outlying. Figure 5.1(c) shows both sets of residuals together, with the composite residuals represented by stars, and the residuals represented by triangles. We note that the largest difference among them is for counties 5 and 11, where the composite residuals are nearly all smaller. Not surprisingly, counties 5 and 11 have the largest BLUPs in Figure 5.1(d). Also, because the residual for case 33 is large and a different sign than the other residuals for county 12, it appears that Uu would be considerably larger if not for observation 33. A local influence analysis can assess this possibility more thoroughly. 5.9.3 Scheme 1: Residual Variance Perturbation Figure 5.2 contains local influence diagnostics for perturbing the error variance in the corn model. The first plot shows the basic curvatures, Cb, that result from one-at-a-time perturbations. Observation 33 is considerably more influential than the others owing to its noticeably large curvature. The element for case 33 dominates the direction of largest curvature (Figure 5.2(b)). If all elements were equally contributing, then they would equal either - = ^ = 164 or —.164. Therefore, the value for element 33 is over five times larger than the value of equally-contributing-elements (e.c.e.). Plots (c), (d), and (e) are the directions of maximum curvature for directed influence on the intercept, the CORN slope and the SOY slope respectively. The plots indicate that case 33 is a dominate contributor to the SOY slope in particular. Plot (f) displays the largest curvatures for directing influence on the 12 predicted random effects. This plot shows which BLUPs are most sensitive to perturbation. Here we see that the county 12's BLUP is the most sensitive to the case-weights. This is the county containing the questionable data point. 170 -20 -40 ^5 10"; 115 li 20 ;'>ii -^^ Ai (a) Composite residuals (b) Residuals 10 7.5 5 2.5 -2.5 -5 ■7.5 -10 2 4 t 6 8 10 K (c) Residujils and composite residu- als (d) County BLUPs Figure 5.1: Corn data; diagnostic plots 171 3 \ ♦■. 5 10 15 20 25 30 35 1 0.75 0.5 0.25 •0.25 •0.5 -0.75 •1 ■ *..-.« ' (a) Basic curvatures (b)^„ 1 0.75 0.5 0.25 -0.25 ■0.5 •0.75 -1 if * * f A i ^* ' '■ ,i ,'54 f^U ',; iJ ; ?0» * J^ * 30 3^.. 1 0.75 0.5 0.25 •0.25 -0.5 -0.75 ■1 i^ Aa '? V ^0 *i 15^* ^0 ^^'i^ ?D (c) Vmax for /3o (d) I'max for /Scorn 1 0.7S O.S 0.25 -0.25 -0.5 -0.75 •1 35 1.5 1.25 1 0.75 0.5 0.25 2 4 6 10 12 (e) Vmax for 0s (f) C'max's for each of the Uj Figure 5.2: Corn data; Scheme 1; local influence diagnostics 172 Figure 5.3 shows the directional vectors Vmax for directed influence on the individual BLUPs. Typically, it would be unnecessary to examine all of these plots, especially if there are a large number of random effects in the model. Nonetheless, all plots have been included here to more fully demonstrate their behavior. Upon inspection, maximum perturbation of Ui is generally achieved by perturbing cases in cluster i. This is not surprising given the closed- form results for the one-way random effects model in Section 5.8.3. Only in plots (a) and (g) do we see an observation in another county exerting influence. In both cases, observation 33 is the influential out-of-cluster data point. Figure 5.4 contains local influence diagnostics associated with point inferences for linear combinations of fixed and random effects. Recall that in Chapter 2, MaxFITS was proposed as a measure of how sensitive fitted values are to residual variance perturbation, while BasicFITS was proposed as a measure of how much a single observation determines its fitted value. These diagnostics are Cmax and C(, values for M , respectively. Figure 5.4(a) shows MaxFITS (stars) and BasicFITS (triangles). The values for MaxFITS indicate that the prediction of cases 33 and 34 are noticeably more sensitive to pertubation than the other predictions. The sensitivity of prediction 33 is largely a function of perturbing this case, as demonstrated by the fact that its BasicFITS value is nearly as large as its MaxFITS value. This is confirmed by Umax for directed infiuence on the fitted value of case 33 - element 33 is by far the largest contributor (plot (c)). However, the sensitivity of case 34's prediction is not largely a functon of perturbing case 34, as demonstrated by a moderately sized BasicFITS value. The source of the influence is in fact case 33, as is evident in the plot of Umax for directed influence on the fitted value of case 34, (plot (d)). In short, the fitted values for observations 33 and 34 are both sensitive because they are greatly influenced by the model's assumption that observation 33 is just as reliable as the rest of the data. ji j^ _ V •* # ■■ ' ■ ■ ■. 173 0.15 O.S 'i '^■■1 yi'"' ' . ■*< **« , " ■ t"".i*J * « JU * A ^ 1 0.75 0.5 0.15 '0.25 •M •<.7I -I • ■ ■ ; ' '"ft"' '' 5 "' 'iy 'i t ' " ^o" '^ i V (a) Umax for Ml (b) Umax for U2 (c) Umax for U3 ■0.75 •1 •'I ■ ; *"' • ' ■ *' . ,^"V, ■' l i ^' ^io ' / ' B^-V-^y^ViV I 0.75 -0.25 ■■<'■'■ •0.75 -1 ./^ . ' /"^ ' •'r iy"^ 5' » 1 0.75 0.5 0.15 ■0.25 -0.5 -0.75 » 5" . if ' 15 "ar ^ • 30 » J^ (d) Umax for U4 > ,-,' ,;, . ^. (e) Umax for U5 (f) Umax for U6 1 0.75 0.5 0.25 -0.25 -0.5 -0.75 -1 1 .' 'i'.'lt t t ■ ii •.» •.s 0.25 /"'^" •' y.^ ' ij ^ -0.25 -(.5 •t.n •1 ' ■ "^MyVj ' ^ ' •V^'i ^ 'V. ^5 10 "ft** ) ■ ■ ;» • ./ • ft (g) Umax for U7 (h) Umax for Us (i) Umax for Ug 0.75 0.5 0.25 -0.25 -0.5 -0.75 '•'sYV'^ft >;;j5* 30 V^. 1 •.» O.S 0.25 -0.25 -I, -I.TS ■1 ', »,fti^V..j .^svV'if •*'* 30 ' ij'f » 1 0.75 0.5 0.25 -0.25 -0.5 -0.75 -1 " ■';vV'y'tf ' "ft ' 'Tr ij^ (j) Umax for Wio (k) Umax for Uu (1) Umax for U12 Figure 5.3: Corn data; Scheme 1; directed influence on individual BLUPs 174 Of course, the main goal of the analysis is to obtain predictions for the mean hectares per segment in each of the 12 counties. To determine whether these predictions are sensitive to Scheme 1, a variation of BasicFITS and MaxFITS is used. First, the 12 Cmax values for directed influence on county mean predictions are obtained. These are used to assess the relative sensitivity of the predictions. Next, the 37 basic curvatures, Cj,, are calculated for individual segments as directed upon the prediction for the corresponding county. This allows us to examine how much one-at-a-time perturbations within a cluster affect the prediction of that cluster's predicted mean. Finally, the 12 Cmax values and the 37 C(, values are plotted together (Figure 5.4(b)). Here the Cb values are plotted against case indices, and the Cmax values are plotted against the case indices of observations in the corresponding county. (Plot (b) is essentially a BasicFITS/MaxFITS plot when only one prediction is of interest in each cluster.) The implications for the corn analysis are two-fold. First, the prediction for county 12 is clearly the most sensitive, and this is due largely to the effects of perturbing case 33. Second, the influence that case 33 showed on Psoy, ui, and 1x7 (Figures 5.2e, 5.3a, 5.3g) does not translate into noticeable influence on the overall predicted means in the other 11 counties. 5.9.4 Scheme 2: Cluster-based Variance Perturbation Having seen the influence of perturbing case 33's variance, we may want to assess the eff'ects perturbing the variance associated with county 12. To this end, we implement the cluster-based pertubation scheme suggested in Section 5.8.3. Since there are only 12 perturbations, we implement the strategy mentioned in Section 2.6 - including the benchmarks ±-j= = ±.577 and ±-7= = ±-866 as horizontal lines. This should prevent us from confusing somewhat large elements with truely prominent ones. Also, directional vectors are rendered as star plots as well as index plots. Recall that a star plot is merely an alternative representation that suppresses the signs of the elements, but does not give the illusion of ordering that index plots 175 u u < ■"^ 1 A ^^v 5 ID 15 2D 25 30 35 Case (a) BasicFITS and MaxFITS (b) Cmax and Cb for predicted mean hectares per segment 1 0.75 0.5 0.25 •0.25 ■0.5 ■0.75 •1 1 1 * * > ii 44 ..t A j. ' .A y 3^ i35» 1 0.75 0.5 0.25 ■0.25 •0.5 -0.75 •1 * ;5* » "l? "so 25 30 ;ii ***i^ i > fc (c) Vmax for J/ss (d) Umax for ^34 Figure 5.4: Corn data; Scheme 1; directed influence on predictions 176 do. Each element is displayed as a point, with the distance from the origin equal to its absolute value. The plot is completed by labelling the points with indices and drawing rings to represent the benchmarks ~ and -p=. Figure 5.5 contains index plots and a star plots of the three distinct eigenvectors resulting from the cluster perturbation. Element 12 of Umax is almost three times the value of equally contributing elements (plots (a) and (b)). Element 11 of ^2 is nearly two times that of e.c.e. (plots (c) and (d)), while element 8 of U3 is about two times that of e.c.e. (plots (e) and (f)). Nonetheless, the second two eigenvectors tend to be more noise than signal. To explore further how cluster 12 is affecting inference. Figure 5.6 gives directions of maximum curvature for directed influence on each of the individual BLUEs under the original parameterization. Element 12, together with element 11, appears prominantly in u^ax for both of the slope estimates. This indicates that the slopes, rather than the intercept, are being influenced by county 12. To assess eff'ects of the perturbation on the set of «,, Figure 5.7 gives the •• [ul three eigenvectors of M . The direction of largest curvature shows that element 11 is more than two times that of equally contributing elements, while element 12 is roughly three times that of e.c.e. in the second eigenvector. Finally, element 8 is most prominent in the third eigenvector. Because element 12 stands out in V2, but not Umax, thcsc plots are inconclusive as to whether county 12 is influential as a whole on the BLUPs. Finally, we address the main question of interest- whether or not the predicted mean hectares per segment are sensitive to the perturbation. Since there is only one prediction per county, we can use the perturbation scheme to produce cluster-based analogs of BasicFITS and MaxFITS (Figure 5.8). The MaxFITS values show that all of the predicted means show comparable sensitivity to perturbation. 177 Ir 0.75 0.5 o.is • , • ' 2 4 • i * 10 12 -ft.35 -0.5 -0.75 -1 (a) v„ -lii (b) IvmaxI Star-plot Ir 0.75 0.5 0.25 2 , * 6 . 8 -0.25 -0.5 •0.75 •1 10 12 (C)U2 (d) \v2\ star-plot It 0.75 0.5 «.2S ■0.25 ■0.5 -0.75 -1 * 2 4 6 8 S U (e)t;3 ^5* (f) l^sl star-plot Figure 5.5: Corn data; Scheme 2; local influence on 178 X 0.75 0.5 • 0.25 t • • • • 2 4 6 8 10 12 •0.25 • • • • -0.5 0.75 ■1 (a) Wmax for $0 1 0.75 • 0.5 ■,., ' t.2S ., , , . . • . • i 1 6 8 • V 12 0.25 -0.5 0.75 • -1 (b) I Umax I for ySo star-plot (c) Vraax for ^corn (d) It'maxI for /Scorn Stax-pfot 1 0.75 0.5 • 0.25 • . . • . 2 4 « • 8 ' 10 12 0.25 -0.5 •■ 0.75 -1 (e) Vmax for 4«o» -412 ■■ (f) l^maxl for ^,ov Star-plot Figure 5.6: Corn data; Scheme 2; Umax's for individual BLUEs 179 The small BasicFITS values indicate that none of the counties is self-predicting: not even county 12. 5.9.5 Scheme 3: Alternate Cluster-based Variance Perturbation The alternate cluster variance scheme can also be applied to the data set. The acceleration matrix of the likelihood displacement has twelve non-zero eigenvalues. The first two eigenvalues appear in Figure 5.9 as index plots and star plots, and both vectors implicate elements 5 and 11 as being influential. The other ten eigenvectors are dominated by a single unique element, and are not shown. Figure 5.10 shows the directions of maximum curvature for directed influence on each of the individual BLUEs. Although a couple of elements exceed the 2/y/\2 benchmark, the plots are relatively noisy. The small value for element twelve in all of the plots suggests that perturbing county 12's random effect variance has little effect on the fixed effects estimates. Finally, Figure 5.11 provides the BasicFITS/MaxFITS plot for the alternate cluster perturbation. The first thing we notice about the plot is that the BasicFITS values are nearly as large as the MaxFITs values. This suggests that each /ij is most sensitive to the perturbation of V{ui). We also recall that for the simple random sample, the i^^ element of Umax for /tj was roughly g —1 times larger than the other elements. Hence, Figure 5.11 suggests that this dominance may be maintained in more complicated models. If this is true, it may be sufficient to only plot the MaxFITS values. For the corn data, the MaxFITS values indicate that the predicted means for counties with large BLUPs (counties 5 and 11) are most sensitive to the perturbation of the cluster variance. 5.9.6 Conclusions and Action Having examined these plots, the analyst should be very wary of observation 33. Scheme I indicates that if its precision varies from that of the other segments 180 Ir 0.75 0.5 0.25 • • •0.25 -0.5 -0.75 -1 6 . 8 (a) Vr, 10 12 (b) Ivmaxi Star-plot Ir 0.75 0.5 0.25 ! f » 10 12 -0.25 -0.5 -0.75 -1 (c) V2 y '^ Ir 0.75 0.5 0.35 2 4 6 10 » -0.25 -0.5 -0.75 -1 (e) V3 (d) |t;2| stax-plot (f) l^sl star-plot -*12. Figure 5.7: Corn data; Scheme 2; local influence on u 181 Figure 5.8: Corn data; Scheme 2; local influence on predicted means-cluster-based BasicFITS and MaxFITS 1 0.75 o.s 0.2S -0.25 -0.5 -0.75 -1 » t 6 • 8 ir^ (a) Vr, (b) I f max I star-plot 1 0.75 0.5 -0.25 •8.5 -0.75 -1 ■^-T « » ! ? • 10 12 (C) V2 (d) \v2\ star-plot Figure 5.9: Corn data; Scheme 3; local influence on 182 in county 12, the county's predicted mean hectares of corn per segment is affected (Figure 5.4(b)). Using the weighted county perturbation scheme leads to inconclusive evidence about county 12's overall influence. Although element 12 figures prominantly in the Vmax's for the two slope estimates (Figure 5.6), even the 12*'* county's predicted mean appear unaffected (Figure 5.8). Influence diagnostics for perturbing cluster variances (Figures 5.9, 5.10, and 5.11) suggest that the assumption of equal variance for the 12"* random effect is not influential. Therefore, it is deemed that differential weighting of observation 33 is the main cause of sensitivity, and since the value was considered suspect a-priori, deletion or substantial down-weighting of this single observation seems warranted. - - ^'.. ■i ■ . - • '^■. ■ '',> r^J"^_ 'it j^_. ■•< / • ■ "' r ■if t 183 1 0.75 0.5 0.25 -0.25 -0.5 -0.75 -1 « • • ! 4 I 10 12 (a) t^max for /?0 (b) Ivmaxl for $0 star-plot 1 0.75 0.5 0.25 -0.25 ■0.5 -0.75 -1 2 4 6 yi 12 (c) t^max for ^corn (d) |l»max| for $corn Stax-plot 1 0.75 • 0.5 • e.2s * • 0.25 J , 1 . 6 8 « 10 1*2 • • -0.5 0.75 -1 (e) Vmax for /9,oa (f) It'maxI for 0soy StjlT-plot Figure 5.10: Corn data; Scheme 3; Umax's for individual BLUEs t- V 1. ■- 184 0.8 0.6 0.4 0.2 Figure 5.11: Corn data; Scheme 3; influence on predicted means basic curvatures and maximum curvatures CHAPTER 6 MULTI-STAGE ESTIMATIONS AND FUTURE RESEARCH In this chapter, several areas of future work are outlined, most of which stem from the generalization of local assessment to estimating functions in Chapter 4. First, some preliminary results are given for extending the technique to multi-stage estimations. Then, a synopsis of the dissertation is presented along with some additional areas of future research. 6.1 Local Assessment in Multi-stage Estimations Sometimes inference takes place in more than one formal stage. One example is linear mixed models when variance components are unknown. Restricted/residual maximum likelihood (REML) (Patterson and Thompson 1971; Robinson 1987; Searle et al. 1992) can be used to estimate the variance parameters. Then the mixed model equations are used to estimate /3 and predict m, with the estimated variance parameters treated as known. Local influence techniques can be used to assess the effects of perturbations on functions of the two sets of parameter estimates using results from Chapter 4. First, let Oi denote the solution of pi estimating equations hi{0i;y) = 0. Further, let 0^ be the solution of P2 estimating equations h2{02,y,Oi) = for fixed ^1. Perturbation leads to two sets of perturbed equations, /ii,w(^i;y) = and h2,ui{02',y,0i,u) = 0, with solutions ^i,,^ and 02,w respectively. The effects of perturbation are measured by m, which depends on u; explicitly and via 0i^ and 02,ui- The algebraic expressions and computational formulas for first- and second-order influence given in Chapter 4 can be used by simply using the notation O' = {0[,0'2). 185 186 Because /ii,a;(^i;y) does not depend on 02,^, there are some simplifications in the computational formulas. For instance, using results on partitioned matrices (Searle 1982), h' = T -1 /ill -1 h 11 7l22 h2lhn /I22 (6.1) Partitioning A leads to a simplification in J: d(jj' -h Ai A2 (6.2) -/iiiAi h22 \h2\h11 Ai - A2J 6.2 Synopsis and Additional Future Research The local influence approach to diagnostics is largely unknown, owing in part to the lack of introductory resources on the technique. It is hoped that the extended and detailed discussion in the first two chapters of this thesis have shed some light on this powerful and practical methodology. The diagnostics derived for perturbing the entire X matrix in Chapter 3 give previously unavailable detail about how collinearity and overfitting can aff"ect estimation. The results should encourage researchers to utilize perturbations other than just case-weights and place more emphasis on finding closed-forms for the resulting diagnostics. Because analysts are unaccustomed to using directional vectors, having interpretable formulas for special cases will help them understand and appreciate the results. 187 The foundation of the local influence methodology is a Taylor series approximation to a high-dimensional combined with some familiar geometry concepts: slopes and accelerations. Using these ideas, the generalization to a class of influence measures for optimal estimating functions in Chapter 4 is straightforward. This extention of the methodology provides many avenues for future research; quasi-likelihood and mixed model estimation are only the beginning of the areas of application. As for other areas of research, the idea of perturbations is an excellent concept for linking influence diagnostics, sensitivity analysis, robust inference, and aspects of optimal design. Robust estimation techniques are those that yield inferences that are largely insensitive to perturbations. Consequently, minor aspects of the data, model, and inference technique do not overly influence scientific conclusions. If a perturbation and influence measure are appropriate for several methods of inference, the maximum local influence could be used to compare the techniques' robustness. In fact, mimimizing the maximum local influence could even serve as a criterion for robustness (minimax li estimation). Optimal sampling and experimental designs, which are efficient under assumed conditions, should also be designed so that they are insensitive to perturbation. Local assessment could be used to assess how sampling and experimental designs can ameliorate or compound influential phenomenon. ; » APPENDIX A MODEL FITTING INFORMATION FOR SCHOOL DATA NOTE: No intercept in model. R-squeire is redefined. Dependent Variable: VSCORE Analysis of Viuriance Sum of Mean Source DF Squares Square F Value Prob>F Model 5 17.21982 3.44396 29.019 0.0001 Error 15 1.78018 0.11868 U Total 20 19.00000 Root HSE 34450 R-square 9063 Dep Mean - 00000 Adj R-sq 8751 C.V. -6.7455S8E16 Parameter Estimates Peurameter Standard T for HO: Variable DF Estimate Error Parameter=0 Prob > |T| SALARY -0.139993 0.09301762 -1.505 0.1531 HCOLR 0.194122 0.22907761 0.847 0.4101 SES 0.919607 0.14859782 6.189 0.0001 TSCORE 0.250736 0.09464627 2.649 0.0182 HONED -0.203697 Variimce 0.22031344 -0.925 0.3698 Variable DF Inflation SALARY 1.38519879 WCOLR 8.40130895 SES 3.53513942 TSCORE 1.43413046 MOMED 7.77076288 Collinearity Diagnostics Number Condition Var Prop Var Prop Var Prop Var Prop Var Prop r Eigenvalue Index SALARY WCOLR SES TSCORE MOMED 1 2.83681 1.00000 0.0134 0.0129 0.0296 0.0071 0.0142 2 1.39508 1.42598 0.2192 . 0040 0.0020 0.2456 0.0026 3 0.49664 2.38997 0.7601 0.0006 0.0064 0.6485 0.0004 4 0.20253 3.74262 0.0011 0.0655 0.9471 0.0418 0.1258 5 0.06894 6.41490 0.0062 0.9170 0.0149 0.0570 0.8570 188 189 Obs Dep Var VSCORE Predict Value Std Err Std Err Student Predict Residual Residual Residual 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0.3314 -1.4737 0.2454 0.9657 0.3468 -0.2033 1 . 1548 -0.2892 1.0190 0.3640 -2.0255 0.0202 -0.0314 -0.3408 -2.1287 0.7938 -0.5643 -0.5815 1.3783 1.0190 0.2714 -1.4135 0.9244 1.0471 0.2126 -0.1885 1.0312 -0.2146 0.9116 0.3278 -1.6459 -0.2800 . 1489 -0.2814 -1.8229 0.5707 -0.3168 -1.4413 1 . 1853 0.9738 0.227 0.228 0.099 0.120 0.123 0.231 0.150 0.083 0.167 0.260 0.169 0.205 0.195 0.084 0.188 0.113 0.169 0.181 0.167 0.143 0.0600 -0.0602 -0.6790 -0.0814 . 1342 -0.0147 0.1236 -0.0747 0.1073 . 0362 -0.3796 . 3002 -0.1803 -0.0594 -0.3058 0.2230 -0.2475 0.8598 0.1930 . 0452 0.260 0.259 0.330 0.323 0.322 0.256 0.310 0.334 0.301 0.226 0.300 0.277 0.284 0.334 0.289 0.326 0.300 0.293 0.301 0.313 0.231 -0.233 -2.058 -0 . 252 0.417 -0.058 0.398 -0.223 0.356 0.160 -1.265 1.083 -0.634 -0.178 -1.058 0.685 -0.825 2.932 0.640 0.144 Obs -2-1-0 1 2 Cook's D Rstudent Hat Diag H Cov Ratio Dffits 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0.008 0.008 0.077 0.002 0.005 0.001 0.007 0.001 0.008 0.007 0.102 0.128 0.038 0.000 0.094 0.011 0.043 0.654 0.025 0.001 0.2236 -0.2251 -2 . 3475 -0.2441 0.4055 -0.0557 0.3868 -0.2160 0.3457 0.1545 -1.2931 1.0900 -0.6212 -0.1720 -1.0629 0.6725 -0.8152 4.3370 0.6272 0.1394 0.4325 0.4362 0.0831 0.1212 0.1281 0.4499 0.1891 0.0575 0.2354 0.5680 0.2414 0.3527 0.3193 0.0588 0.2964 0.1069 0.2406 0.2756 . 2346 0.1727 2.4440 2.4596 0.2929 1.5729 1.5276 2.5639 1.6511 1.4734 1.7697 3.2410 1.0588 1.4516 1.8106 1.4844 1.3615 1.3486 1.4745 0.0276 1.6059 1.6948 0.1952 -0.1980 -0.7067 -0.0906 0.1555 -0.0504 0.1868 -0.0534 0.1918 0.1771 -0.7295 0.8046 -0.4254 -0.0430 -0.6899 0.2326 -0.4589 2 . 6748 0.3472 0.0637 190 SALARY WCOLR SES TSCORE HOMED lbs Dfbetas Dfbetas Dfbetas Dfbetas Dfbetas 1 0.1627 -0.0882 0.0679 -0.0276 0.0339 2 -0.0660 -0.1443 0.0717 0.0126 . 1444 3 0.0734 -0.0199 0.0654 -0.2036 -0.2487 4 -0.0033 0.0428 -0.0220 -0.0051 -0.0563 6 0.0769 -0.1122 0.1070 -0.0453 . 0394 6 . 0036 0.0155 -0.0154 0.0418 -0.0105 7 -0.0748 0.1413 -0.0036 0.0380 -0.0973 8 0.0312 -0.0014 -0.0220 -0.0142 0.0250 9 0.0130 0.1498 -0.0105 . 1046 -0.1325 10 -0.1007 -0.0145 0.0400 0.1489 -0.0279 11 0.2465 0.0798 0.3961 0.1611 -0.2056 12 0.0446 0.2661 0.4139 -0.2827 -0.6298 13 0.1588 0.2764 -0.3238 0.0694 -0.0699 14 0.0169 0.0145 0.0204 -0.0276 -0.0236 16 -0.0457 -0.0462 0.6017 -0.1181 -0.1775 16 0.1471 . 0342 -0.0379 -0.0840 0.0415 17 -0.4650 . 0429 0.0595 0.2269 -0.0573 18 -0.2034 -1.1197 -1.8118 0.0013 1.9401 19 -0.1005 . 0353 -0.1001 0.0803 0.1385 20 -0.0336 0.0341 -0.0005 0.0048 -0.0165 ^,■" ■-'^1* s APPENDIX B MODEL FITTING INFORMATION FOR CORN CROP DATA The MIXED Procedure Class Level Information Class Levels Values CNTYNUH 12 1 2 3 4 5 6 7 8 9 10 11 12 REHL Estimation Iteration History Iteration Evaluations Objective Criterion 1 243.72248590 1 2 242.59476305 0.00000000 Convergence criteria met. Covariance Parameter Estimates (REML) Cov Parm Estimate Std Error Z Pr > |Z| CNTYNUM 63.28953258 75.07490859 0.84 0.3992 Residual 297.72802207 84.64939011 3.62 0.0004 Asymptotic Covarlimce Matrix of Estimates Cov Parm Row COVPl C0VP2 CNTYNUM 1 5636.2419002 -2123.731245 Residual 2 -2123.731245 7148.5993683 Model Fitting Information for HECTCORN Description * ^ Value Observations 37 Res Log Likelihood -152.541 Akaike's Information Criterion -154.541 Schwarz's Bayesian Criterion -156.068 -2 Res Log Likelihood 305.0826 Solution for Fixed Effects Effect Estimate INTERCEPT 120.74028537 PIX.CORN 26.76987997 PII.SOT -2.04732793 Std Error DF t Pr > Itl 3.78206463 11 31.92 0.0001 4.56957580 23 5.64 0.0001 4.56652013 23 -0.45 0.6574 191 192 Solution for Random Effects Effect CNTYNUM Estimate SE Pred DF t Pr > Itl CNXyNUM 1 2.18380161 7 . 35943306 23 0.30 0.7693 CNTYNUM 2 1.47455727 7.30784520 23 0.20 0.8419 CNTYNUM 3 -4.72910256 7.26641831 23 -0.65 0.5216 CNTYNUM 4 -2.76387525 6.88938048 23 -0.40 0.6920 CNTYNUM 5 8.36866398 6.42653277 23 1.30 . 2057 CNTYNUM 6 4.27362586 6.52255117 23 0.66 0.5188 CNTYNUM 7 -2.70476376 6.50319394 23 -0.42 0.6813 CNTYNUM 8 1.15642520 6.47806611 23 0.18 . 8599 CNTYNUM 9 5.02562759 6.21203551 23 0.81 0.4268 CNTYNUM 10 -2.88274430 5.92509787 23 -0.49 0.6312 CNTYNUM 11 -8.65058657 5.88291972 23 -1.47 . 1550 CNTYNUM 12 -0.75162907 5.69287011 23 -0.13 0.8961 Tests of Fixed Effects Source NDF DDF Type III F Pr > F PIX.CORM 1 23 31.80 0.0001 PII.SOY 1 23 0.20 0.6574 Predicted Values 193 HECTCORN Predicted SE Pred L95 n95 Residual 165.7600 155.4869 9.7838 135 .2477 175 .7262 10.2731 96.3200 89.3834 8.8869 70 .9994 107 .7674 6.9366 76.0800 98.3267 8.1004 81 .5698 115 .0837 -22.2467 185.3500 170.5417 8.5361 152 .8834 188 .2000 14.8083 116.4300 144.2402 7.5387 128 .6451 159 .8353 -27.8102 162.0800 154.4196 7.0506 139 .8343 169 .0049 7 . 6604 152.0400 125.5822 6.7333 111, .6532 139 .5112 26.4578 161.7500 156.5001 7.1277 141, .7553 171 .2449 5 . 2499 92.8800 91.0834 7.9300 74 .6789 107 .4879 1.7966 149.9400 131.2890 7 . 2587 116 .2733 146, .3048 18.6510 64 . 7500 65.0934 8.7890 46 .9121 83 .2748 -0.3434 127.0700 141.4215 7.1954 126, .5367 156, .3062 -14.3515 133.5500 118.8645 7.1089 104, .1587 133, .5704 14.6855 77.7000 90.7578 7.6708 74 .8896 106 .6260 -13.0578 206.3900 184.9299 9.1302 166, ,0426 203, .8172 21.4601 108.3300 118.7686 6.7585 104, .7876 132 .7496 -10.4386 118.1700 123.7514 7.6313 107, .9648 139, .5380 -5.5814 99.9600 106.1059 7.3867 90, .8254 121, .3864 -6.1459 140.4300 123.6154 6.2159 110, ,7567 136, .4740 16.8146 98.9500 91.7140 8.0456 75, .0704 108, .3575 7.2360 131.0400 125.3031 7 . 5450 109, ,6951 140, .9111 5.7369 114.1200 123.9749 5.9584 111, ,6490 136, .3009 -9.8549 100.6000 97.0015 6.3382 83 .8900 110, .1130 3.5985 127.8800 139.1748 6.4405 125, ,8516 152, .4980 -11.2948 116.9000 107.4351 5 . 9034 95, .2230 119, .6473 9 . 4649 87.4100 92.8847 6.9265 78, .5562 107, .2132 -5.4747 93 . 4800 85.2028 9.2386 66 .0913 104, .3142 8 . 2772 121.0000 138.6914 6.9531 124, .3079 153, .0749 -17.6914 109.9100 127.4057 8.0247 110, .8053 144, .0061 -17.4957 122.6600 129.0737 6.0804 116, .4955 141, .6519 -6.4137 104.2100 111.5808 6.1297 98 .9005 124, .2610 -7.3708 88.5900 89.8508 6.3707 76 .6720 103 .0297 -1.2608 88.5900 139.1245 8.2468 122, .0648 156 .1843 -50.5345 165.3500 142.4030 6.2021 129, .5729 155, .2330 22.9470 104.0000 106.1154 5.7054 94 .3129 117 .9179 -2.1154 88.6300 75.2417 8.4331 57 .7964 92 .6869 13.3883 163.7000 139.6604 6.3394 126 .5464 152 .7744 14.0396 REFERENCES Atkinson, A. C. (1986), "Comments on 'Influential Observations, High Leverage Points, and Outliers in Linear Regression'," Statistical Science, 1, 397-402. Barlow, W. E. (1997), "Global Measures of Local Influence for Proportional Hazards Regression Models," Biometrics, 53, 1157-1162. Battese, G. E., Barter, R. M., and Fuller, W. A. (1988), "An Error-components Model for Prediction of County Crop Areas Using Survey and Satellite Data," Journal of the American Statistical Association, 83, 28-36. Beckman, R. J., Nachtsheim, C. J., and Cook, R. D. (1987), "Diagnostics for Mixed-model Analysis of Variance," Technometrics, 29, 413-426. Belsley, D. A., Kuh, E., and Welsch, R. E. (1980), Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, New York: Wiley. Bhapkar, V. P. (1972), "On a Measure of Efficiency of An Estimating Equation," Sankhyd, Series A, Indian Journal of Statistics, 34, 467-472. Billor, N. and Loynes, R. M. (1993), "Local Influence: A New Approach," Communications in Statistics, Part A - Theory and Methods, 22, 1595-1611. Bj0rnstad, J. F. (1990), "Predictive Likelihood: A Review (with discussion)," Statistical Science, 5, 242-265. Booth, J. G. and Hobert, J. P. (1998), "Standard Errors of Prediction in Generalized Linear Mixed Models," Journal of the American Statistical Association, 93, 262-272. Box, G. E. P. (1982), "Choice of Response Surface Design and Alphabetic Optimality," Utilitas Mathematica, 21, 11-55. Brown, G. C. (1997), "Local Influence Regression Diagnostics for Gamma Data," Communications in Statistics, Part A - Theory and Methods, 26, 3051-3067. Butler, R. W. (1986), "Predictive Likelihood Inference With Applications (with discussion)," Journal of the Royal Statistical Society, Series B, 48, 1-38. Cadigan, N. G. (1995), "Local Influence in Structural Equation Models," Structural Equation Modeling, 2, 13-30. Cadigan, N. G. and Farrell, P. J. (1999), "Expected Local Influence in the the Normal Linear Regression Model," Statistics & Probability Letters, 41, 25-30. 194 195 Cook, R. D. (1977), "Detection of Influential Observation in Linear Regression," Technometrics, 19, 15-18. Cook, R. D. (1979), "Influential Observations in Linear Regression," Journal of the American Statistical Association, 74, 169-174. Cook, R. D. (1986), "Assessment of Local Influence (with discussion)," Journal of the Royal Statistical Society, Series B, 48, 133-169. Cook, R. D. and Weisberg, S. (1991), "Added Variable Plots in Linear Regression," In Stahel, W. and Weisberg, S., editors. Directions in Robust Statistics and Diagnostics. Part I, pages 47-60, New York: Springer- Verlag. Cox, D. R. and Reid, N. (1987), "Parameter Orthogonality and Approximate Conditional Inference (with discussion)," Journal of the Royal Statistical Society, Series B, 49, 1-39. Crowder, M. (1986), "On Consistency and Inconsistency of Estimating Equations," Econometric Theory, 2, 305-330. Crowder, M. (1987), "On Linear and Quadratic Estimating Functions," Biometrika, 74, 591-597. , Desmond, A. F. (1997a), "Optimal Estimating Functions, Quasi-likelihood and Statistical Modelling (with discussion)," Journal of Statistical Planning and Inference, 60, 77-121. Desmond, A. F. (1997b), "Reply to Comments on 'Optimal Estimating Functions, Quasi-likelihood and Statistical Modelling'," Journal of Statistical Planning and Inference, 60, 116-121. Escobar, L. A. and Meeker, William Q., J. (1992), "Assessing Influence in Regression Analysis With Censored Data," Biometrics, 48, 507-528. Farebrother, R. W. (1992), "Relative Local Influence and the Condition Number," Communications in Statistics, Part B - Simulation and Computation, 21, 707-710. Firth, D. (1987), "On the Efficiency of Quasi-likelihood Estimation," Biometrika, 74, 233-245. Fuller, W. A. (1987), Measurement Error Models, New York: Wiley. Fung, W. K. and Kwan, C. W. (1997), "A Note on Local Influence Based on Normal Curvature," Journal of the Royal Statistical Society, Series B, 59, 839-843. Fung, W. K. and Tang, M. K. (1997), "Assessment of Local Influence in Multivariate Regression Analysis," Communications in Statistics, Part A - Theory and Methods, 26, 821-837. Galea, M., Paula, G. A., and Bolfarine, H. (1997), "Local Influence in Elliptical Linear Regression Models," The Statistician, 46, 71-79. Geisser, S. (1991), "Diagnostics, Divergences and Perturbation Analysis," In Stahel, W. and Weisberg, S., editors, Directions in Robust Statistics and Diagnostics. Part I, pages 89-100, New York: Springer- Verlag. Geisser, S. (1993), Predictive Inference. An Introduction, New York: Chapman & Hall. Godambe, V. P. (1960), "An Optimum Property of Regular Maximum Likelihood Estimation," The Annals of Mathematical Statistics, 31, 1208-1212. Godambe, V. P. (1976), "Conditional Likelihood and Unconditional Optimum Estimating Equations," Biometrika, 63, 277-284. Godambe, V. P. (1991), "Orthogonality of Estimating Functions and Nuisance Parameters," Biometrika, 78, 143-151. Godambe, V. P. and Heyde, C. C. (1987), "Quasi-likelihood and Optimal Estimation," International Statistical Review, 55, 231-244. Godambe, V. P. and Thompson, M. E. (1989), "An Extension of Quasi-likelihood Estimation (with discussion)," Journal of Statistical Planning and Inference, . 22, 137-172. Henderson, C. R., Kempthorne, O., Searle, S. R., and VonKrosig, C. N. (1959), "Estimation of Environmental and Genetic Trends from Records Subject to Culling," Biometrics, 15, 192-218. Hilden-Minton, J. A. (1995), Multilevel Diagnostics for Mixed and Hierarchical Linear Models, PhD thesis, The University of California, Los Angeles. Huynh, H. (1982), "A Comparison of Four Approaches to Robust Regression," Psychological Bulletin, 92, 505-512. Johnson, A. R. and Wichern, D. W. (1992), Applied Multivariate Statistical Analysis (3rd ed.), Englewood Cliffs, NJ: Prentice-Hall. Jung, K.-M., Kim, M. G., and Kim, B. C. (1997), "Local Influence in Maximum Likelihood Factor Analysis," Computational Statistics and Data Analysis, 24, 483-491. Kass, R. E., Tierney, L., and Kadane, J. B. (1989), "Approximate Methods for Assessing Influence and Sensitivity in Bayesian Analysis," Biometrika, 76, 663-674. Kim, C. (1996a), "Local Influence and Replacement Measure," Communications in Statistics, Part A - Theory and Methods, 25, 49-61. 197 Kim, M. G. (1995), "Local Influence in Multivariate Regression," Communications in Statistics, Part A - Theory and Methods, 24, 1271-1278. Kim, M. G. (1996b), "Local Influence in Box-Cox Transformation for Multivariate Regression Data," Communications in Statistics, Part A - Theory and Methods, 25, 2955-2969. Kim, M. G. (1996c), "Local Influence in Multivariate Normal Data," Journal of Applied Statistics, 23, 535 541. Kim, M. G. (1998), "Local Influence on a Test of Linear Hypothesis in Multiple Regression Model," Journal of Applied Statistics, 25, 145-152. Kim, Y.-I. (1991), "Local Influence Approach Diagnostics for Optimal Experimental Design," Korean Journal of Applied Statistics, 4, 195-207. Lavine, M. (1992), "Local Predictive Influence in Bayesian Linear Models With - Conjugate Priors," Communications in Statistics, Part B - Simulation and Computation, 21, 269-283. Lawrance, A. J. (1988), "Regression Transformation Diagnostics Using Local Influence," Journal of the American Statistical Association, 83, 1067-1072. Lawrance, A. J. (1991), "Local and Deletion Influence," In Stahel, W. and Weisberg, S., editors. Directions in Robust Statistics and Diagnostics. Part I, pages 141-157, New York: Springer- Verlag. Lee, A. H. and Zhao, Y. (1996), "Assessing Local Influence in Measurement Error Models," Biometrical Journal. Journal of Mathematical Methods in Biosciences, 38, 829-841. Lee, A. H. and Zhao, Y. (1997), "Assessing Influence on the Goodness-of-link in Generalized Linear Models," Statistics & Probability Letters, 31, 351-358. Lee, Y. and Nelder, J. A. (1996), "Hierarchical Generalized Linear Models (with discussion)," Journal of the Royal Statistical Society, Series B, 58, G19'678. Lejeune, M. and Faulkenberry, G. D. (1982), "A Simple Predictive Density Function (with discussion)," Journal of the American Statistical Association, 77, 654-659. LesafFre, E., Asefa, M., and Verbeke, G. (1999), "Assessing the Goodness-of-Fit of the Laird and Ware Model- an Example: The Jimma Infant Survival Differential Longitudinal Study," Statistics in Medicine, 18, 835-854. LesaflFre, E. and Verbeke, G. (1998), "Local Influence in Linear Mixed Models," Biometrics, 54, 570-582. 198 Levy, M. S. and Perng, S. K. (1984), "A Maximum Likelihood Prediction Function for the Linear Model With Consistency Results," Communications in Statistics, Part A - Theory and Methods, 13, 1257-1273. Liang, K.-Y. and Zeger, S. L. (1986), "Longitudinal Data Analysis Using Generalized Linear Models," Biometrika, 73, 13-22. Lindsay, B. (1982), "Conditional Score Functions: Some Optimality Results," Biometrika, 69, 503-512. Littell, R. C, Milliken, G. A., Stroup, W. W., and Wolfinger, R. D. (1996), SAS System for Mixed Models, Gary, NC: SAS Institute. Loynes, R. M. (1986), "Comments on "Assessment of Local Influence"," Journal of the Royal Statistical Society, Series B, 48, 156-157. Lu, J., Ko, D., and Chang, T. (1997), "The Standardized Influence Matrix and Its Applications," Journal of the American Statistical Association, 92, 1572-1580. Mathiasen, P. E. (1979), "Prediction Functions," Scandinavian Journal of Statistics, 6, 1-21. McCullagh, P. (1983), "Quasi-likelihood Functions," The Annals of Statistics, 11, 59-67. McCullagh, P. and Nelder, J. A. (1989), Generalized Linear Models (2nd Ed.), London: Chapman & Hall. McCulloch, C. E. (1997), "Maximum Likelihood Algorithms for Generalized Linear Mixed Models," Journal of the American Statistical Association, 92, 162-170. McCulloch, R. E. (1989), "Local Model Influence," Journal of the American Statistical Association, 84, 473-478. Morton, R. (1981), "Efficiency of Estimating Equations and the Use of Pivots," Biometrika, 68, 227-233. Pan, J.-X., Fang, K.-T., and Liski, E. P. (1996), "Bayesian Local Influence for the Growth Curve Model With Rao's Simple Covariance Structure," Journal of Multivariate Analysis, 58, 55-81. Pan, J.-X., Fang, K.-T., and von Rosen, D. (1997), "Local Influence Assessment in the Growth Curve Model With Unstructured Covariance," Journal of Statistical Planning and Inference, 62, 263-278. Patterson, H. D. and Thompson, R. (1971), "Recovery of Inter-block Information When Block Sizes Are Unequal," Biometrika, 58, 545-554. Pettitt, A. N. and Daud, I. B. (1989), "Case-weighted Measures of Influence for Proportional Hazards Regression," Applied Statistics, 38, 51-67. 199 Poon, W.-Y. and Poon, Y. S. (1999), "Conformal Normal Curvature and Assessment of Local Influence," Journal of the Royal Statistical Society, Series 5, 61, 51-61. Prentice, R. L. (1988), "Correlated Binary Regression With Covariates Specific to Each Binary Observation," Biometrics, 44, 1033-1048. Rancel, M. M. S. and Sierra, M. A. G. (1999), "Measures and Procedures for the Identification of Locally Influential Observations in Linear Regression," Communications in Statistics, Part A - Theory and Methods, 28, 343-366. Robinson, D. L. (1987), "Estimation and Use of Variance Components," The Statistician, 36, 3-14. Schall, R. and Dunne, T. T. (1992), "A Note on the Relationship Between Parameter Collinearity and Local Influence," Biometrika, 79, 399-404. Schall, R. and Gonin, R. (1991), "Diagnostics for Nonlinear Lp-norm Estimation," Computational Statistics and Data Analysis, 11, 189-198. Schwarzmann, B. (1991), "A Connection Between Local-influence Analysis and Residual Diagnostics," Technometrics, 33, 103-104. Searle, S. R. (1971), Linear Models, New York: Wiley. Searle, S. R. (1982), Matrix Algebra Useful for Statistics, New York: Wiley. Searle, S. R., Casella, G., and McCuUoch, C. E. (1992), Variance Components, New York: Wiley. Shi, L. (1997), "Local Influence in Principal Components Analysis," Biometrika, 84,175-186. St. Laurent, R. T. and Cook, R. D. (1993), "Leverage, Local Influence and Curvature in Nonlinear Regression," Biometrika, 80, 99-106. Swokowski, E. W. (1988), Calculus with Analytical Geometry (2nd ed.), Boston: PWS-Kent. Tang, M. K. and Fung, W. K. (1996), "First Order Local Influence of Test Statistics in Multivariate Regression," Sankhyd, Series B, Indian Journal of Statistics, 58, 323-337. Tawn, J. (1987), "Comments on 'Parameter Orthogonality and Approximate Conditional Inference'," Journal of the Royal Statistical Society, Series B, 49, 33-34. - Thomas, W. (1990), "Influence on Confidence Regions for Regression Coefficients in Generalized Linear Models," Journal of the American Statistical Association, 85, 393-397. 200 Thomas, W. (1991), "Influence Diagnostics for the Cross- validated Smoothing Parameter in Spline Smoothing," Journal of the American Statistical Association, 86, 693-698. Thomas, W. and Cook, R. D. (1989), "Assessing Influence on Regression Coefficients in Generalized Linear Models," Biometrika, 76, 741-749. Thomas, W. and Cook, R. D. (1990), "Assessing Influence on Predictions From Generalized Linear Models," Technometrics, 32, 59-65. Tsai, C.-L. and Wu, X. (1992), "Assessing Local Influence in Linear Regression Models With First-order Autoregressive Or Heteroscedastic Error Structure," Statistics & Probability Letters, 14, 247-252. Vos, P. W. (1991), "A Geometric Approach to Detecting Influential Cases," The Annals of Statistics, 19, 1570-1581. Wang, S.-J. and Lee, S.-Y. (1996), "Sensitivity Analysis of Structural Equation Models With Equality Functional Constraints," Computational Statistics and Data Analysis, 23, 239-256. Wang, X., Ren, S., and Shi, L. (1996), "Local Influence in Discriminant Analysis," Journal of Systems Science and Mathematical Science, 8, 27-36. Wedderburn, R. W. M. (1974), "Quasi-likelihood Functions, Generalized Linear Models, and the Gauss-Newton Method," Biometrika, 61, 439-447. Weiss, R. (1996), "An Approach to Bayesian Sensitivity Analysis," Journal of the Royal Statistical Society, Series B, 58, 739 750. Weissfeld, L. A. (1990), "Influence Diagnostics for the Proportional Hazards Model," Statistics & Probability Letters, 10, 411-417. Weissfeld, L. A. and Schneider, H. (1990), "Influence Diagnostics for the WeibuU Model Fit to Censored Data," Statistics & Probability Letters, 9, 67-73. Wu, X. and Luo, Z. (1993a), "Residual Sum of Squares and Multiple Potential, Diagnostics By a Second Order Local Approach," Statistics & Probability Letters, 16, 289-296. Wu, X. and Luo, Z. (1993b), "Residual Sum of Squares and Multiple Potential, Diagnostics By a Second Order Local Approach," Statistics & Probability Letters, 16, 289-296. Wu, X. and Luo, Z. (1993c), "Second-Order Approach to Local Influence," Journal of the Royal Statistical Society, Series B, 55, 929-936. Wu, X. and Wan, F. (1994), "A Perturbation Scheme for Nonlinear Models," Statistics & Probability Letters, 20, 197-202. aoi Zeger, S. L. and Liang, K.-Y. (1986), "Longitudinal Data Analysis for Discrete and Continuous Outcomes," Biometrics, 42, 121-130. Zhao, Y. and Lee, A. H. (1995), "Assessment of Influence in Non-linear Measurement Error Models," Journal of Applied Statistics, 22, 215-225. BIOGRAPHICAL SKETCH Glen Hartless was born in Summit, New Jersey in 1971. As he grew up, Glen always had an interest in "keeping the numbers." While researching careers in applied mathematics in the tenth grade, he stumbled upon the statistics profession and knew that it was perfect for him. (Considering the content of his dissertation, it is ironic that frustration with the abstractness of matrices is what led him to the field.) Interestingly, while he was still in high school, his father arranged for him to meet Daryl Pregibon, a pioneer in developing diagnostics for logistic regression. Perhaps this chance meeting influenced Glen more then he knows. As an undergraduate at Virgina Tech, he completed a thesis on nonparametric density estimation. Glen also studied exploratory graphical techniques for analyzing reliability data while at an NSF-sponsored Research Experience for Undergraduates (REU) program at the College of William & Mary. After receiving his B.S. in statistics from Virginia Tech in 1994, he attended the University of Florida in pursuit of his Master of Statistics and Ph.D. degrees. There, he did collaborative research for four years in the Department of Family, Youth & Community Sciences. Glen also met and married his wife, Christine, in Gainesville. 202 I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. ^C<^'A ^ <^ t^-^QTsfXyy Jam#s G. Booth, Chairman Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation^^wni is fully adequate, in scope and quality, as a dissertation for the degree of Dottoj/of Phik Ramon C. Litteil, Cochairihan Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation |»Jid is^ullj'^^quate, in scope and quality, as a dissertation for the degree of Doc J^eS ssociate Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Randolph L. Carter Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Janies J. Algina lessor of Educatibi;^l Psychology This dissertation was submitted to the Graduate Faculty of the Department of Statistics in the College of Liberal Arts and Sciences and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. August 2000 Dean, Graduate School LD 1780 20_2£ ,^I333 V '•'.,. .!-,: '- «. UNIVERSITY OF FLORIDA 3 1262 08555 1843